After a little delay, the team finished work on the 0.13 release of scikit-learn.
There is also a user survey that we launched in parallel with the release, to get some feedback from our users.
There is a list of changes and new features on the website.
You can upgrade using easy-install or pip using:
pip install -U scikit-learn
easy_install -u scikit-learn
There were more than 60 people contributing to this release, with 24 people having 10 commits or more.
Again many improvements are behind the scenes or only slightly notable. We improved test coverage a lot and we have much more consistent parameter names now. There is now also a user guide entry for the classification metrics, and their naming was improved.
This was one of the many improvements Arnaud Joly, who joined the project very recently but nevertheless wound up being the one with the second most commits in this release!
Now let me get to some of the more visible highlights of this release from my perspective:
- Thanks to Lars and Olivier, the Hashing Trick finally made it into scikit-learn.
This allows for very fast vectorization of large text corpora and stateless transformers for the same.
- Sample weights were added to the tree module thanks to Noel and Gilles. This enabled the implementation of a smarter resampling for random forests, which leads to a speed-up of random forests of up to a factor of two! Also, this is the basis of including AdaBoost with Trees in the next release.
- I added a method to use totally randomized trees for hashing / embedding features to a high-dimensional, sparse binary representation. It goes along the lines of my last blog post on using non-linear embeddings followed by simple linear classifiers.
- I also added Nystroem kernel approximations, which are really easy to do but should come in quite handy. They still need some more work, though. For details, see my post on kernel approximations.
Thanks to the team for working on this together. I am really happy with the way everybody joins forces, this is an amazing project!