I'm a bit obsessed with MNIST.
Mainly because I think it should not be used in any papers any more - it is weird for a lot of reasons.
When preparing the workshop we held yesterday I noticed one that I wasn't aware of yet: most of the 1-vs-1 subproblems, are really easy!
Basically all pairs of numbers can be separated perfectly using a linear classifier!
And even you you just do a PCA to two dimensions, they can pretty much still be linearly separated! It doesn't get much easier than that. This makes me even more sceptical about "feature learning" results on this dataset.
To illustrate my point, here are all pairwise PCA projections. The image is pretty huge. Otherwise you wouldn't be able to make out individual data points.
You can generate it using this very simple gist.
There are some classes that are not obviously separated: 3 vs 5, 4 vs 9, 5 vs 8 and 7 vs 9. But keep in mind, this is just a PCA to two dimensions. It doesn't mean that they couldn't be separated linarly in the original space.
Interestingly the "1"s are very easy to identify, even with seven and nine there is basically no way to confuse them. The ones have a somewhat peculiar shape, though. It would be fun to see what a tour along the "bow" (see img at [2, 2]) would look like.
Manifold-people should be delighted ;)
I think this plot emphasizes again: look at your data!
I hope you enjoyed this perspective.