Handwritten digit recognition with nonlinear fisher discriminant analysis

Authors:
Pietro Berkes
Affiliations:
Institute for Theoretical Biology, Humboldt University Berlin, Berlin, Germany
Venue:
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Year:
2005

Citing 2
Cited 4

Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Generalized Discriminant Analysis Using a Kernel Approach

Neural Computation

Proximal support vector machine using local information

Neurocomputing
Privacy-preserving linear fisher discriminant analysis

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A theoretical basis for emergent pattern discrimination in neural systems through slow feature extraction

Neural Computation
Invariant object recognition and pose estimation with slow feature analysis

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

To generalize the Fisher Discriminant Analysis (FDA) algorithm to the case of discriminant functions belonging to a nonlinear, finite dimensional function space F (Nonlinear FDA or NFDA), it is sufficient to expand the input data by computing the output of a basis of F when applied to it [1,2,3,4]. The solution to NFDA can then be found like in the linear case by solving a generalized eigenvalue problem on the between- and within-classes covariance matrices (see e.g. [5]). The goal of NFDA is to find linear projections of the expanded data (i.e., nonlinear transformations of the original data) that minimize the variance within a class and maximize the variance between different classes. Such a representation is of course ideal to perform classification. The application of NFDA to pattern recognition is particularly appealing, because for a given input signal and a fixed function space it has no parameters and it is easy to implement and apply. Moreover, given C classes only C - 1 projections are relevant [5]. As a consequence, the feature space is very small and the algorithm has low memory requirements and high speed during recognition. Here we apply NFDA to a handwritten digit recognition problem using the MNIST database, a standard and freely available set of 70,000 handwritten digits (28 × 28 pixels large), divided into a training set (60,000 digits) and a test set (10,000 digits). Several established pattern recognition methods have been applied to this database by Le Cun et al. [6]. Their paper provides a standard reference work to benchmark new algorithms. We perform NFDA on spaces of polynomials of a given degree d, whose corresponding basis functions include all monomials up to order d in all input variables. It is clear that the problem quickly becomes intractable because of the high memory requirements. For this reason, the input dimensionality is first reduced by principal component analysis. On the preprocessed data we then apply NFDA by expanding the training patterns in the polynomial space and solving the linear FDA eigenvalue problem. As mentioned above, since we have 10 classes we only need to compute the first 9 eigenvectors. Since the within-class variance is minimized, the patterns belonging to different classes tend to cluster in the feature space when projected on the eigenvectors. For this reason we classify the digits with a simple method such as Gaussian classifiers. We perform simulations with polynomials of degree 2 to 5. With polynomials of degree 2 the explosion in the dimensionality of the expanded space with increasing number of input dimensions is relatively restricted, so that it is possible to use up to 140 dimensions. With higher order polynomials one has to rely on a smaller number of input dimensions, but since the function space gets larger and includes new nonlinearities, one obtains a remarkable improvement in performance. The best performance is achieved with polynomials of degree 3 and 35 input dimensions, with an error rate of 1.5% on test data. This error rate is comparable to but does not outperform that of the most elaborate algorithms (Table 1). The performance of NFDA is however remarkable considering the simplicity of the method and the fact that it has no a priori knowledge on the problem, in contrast for example to the LeNet-5 algorithm [6] which has been designed specifically for handwritten character recognition. In addition, for recognition, NFDA has to store and compute only 9 functions and has thus small memory requirements and a high recognition speed. It is also possible to formulate NFDA using the kernel trick, in which case one can in principle use function spaces of infinite dimensionality [1,2,3,4]. However, the limiting factor in that formulation is the number of training patterns, which makes it not realistic for this application. The performance of NFDA could be further improved using for example a more problem-specific preprocessing of the patterns (e.g., by increasing the size of the training set with new patterns generated by artificial distortion of the original one), boosting techniques, or mixture of experts with other algorithms [5,6].