What size net gives valid generalization?
Neural Computation
Statistical theory of learning curves under entropic loss criterion
Neural Computation
The nature of statistical learning theory
The nature of statistical learning theory
Rigorous learning curve bounds from statistical mechanics
Machine Learning - Special issue on COLT '94
Exploiting generative models in discriminative classifiers
Proceedings of the 1998 conference on Advances in neural information processing systems II
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections
Journal of Intelligent Information Systems
A new discriminative kernel from probabilistic models
Neural Computation
Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
New Methods for Splice Site Recognition
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Maximum entropy discrimination
Maximum entropy discrimination
Algebraic Analysis for Nonidentifiable Learning Machines
Neural Computation
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
VC dimension and inner product space induced by Bayesian networks
International Journal of Approximate Reasoning
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Face recognition based on multi-class mapping of Fisher scores
Pattern Recognition
Hi-index | 0.00 |
This letter analyzes the Fisher kernel from a statistical point of view. The Fisher kernel is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data (i.e., of the underlying data density). It is important to analyze and ultimately understand the statistical properties of the Fisher kernel. To this end, we first establish sufficient conditions that the constructed posterior model is realizable (i.e., it contains the true distribution). Realizability immediately leads to consistency results. Subsequently, we focus on an asymptotic analysis of the generalization error, which elucidates the learning curves of the Fisher kernel and how unlabeled data contribute to learning. We also point out that the squared or log loss is theoretically preferable--because both yield consistent estimators--to other losses such as the exponential loss, when a linear classifier is used together with the Fisher kernel. Therefore, this letter underlines that the Fisher kernel should be viewed not as a heuristics but as a powerful statistical tool with well-controlled statistical properties.