Effects of Sample Size in Classifier Design
IEEE Transactions on Pattern Analysis and Machine Intelligence
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional distributions and characterizations of multivariate stable distribution
Journal of Multivariate Analysis
Experimental study of performance of pattern classifiers and the size of design samples
Pattern Recognition Letters
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multivariate stable densities as functions of one dimensional projections
Journal of Multivariate Analysis
Learning from Data: Concepts, Theory, and Methods
Learning from Data: Concepts, Theory, and Methods
Skewed α-stable distributions for modelling textures
Pattern Recognition Letters
Comparison of Non-Parametric Methods for Assessing Classifier Performance in Terms of ROC Parameters
AIPR '04 Proceedings of the 33rd Applied Imagery Pattern Recognition Workshop
IEEE Transactions on Pattern Analysis and Machine Intelligence
Density parameter estimation of skewed α-stable distributions
IEEE Transactions on Signal Processing
Classification rules for stable distributions
Mathematical and Computer Modelling: An International Journal
On the mean accuracy of statistical pattern recognizers
IEEE Transactions on Information Theory
Hi-index | 0.03 |
In machine learning problems a learning algorithm tries to learn the input-output dependency (relationship) of a system from a training dataset. This input-output relationship is usually deformed by a random noise. From experience, simulations, and special case theories, most practitioners believe that increasing the size of the training set improves the performance of the learning algorithm. It is shown that this phenomenon is not true in general for any pair of a learning algorithm and a data distribution. In particular, it is proven that for certain distributions and learning algorithms, increasing the training set size may result in a worse performance and increasing the training set size infinitely may result in the worst performance-even when there is no model misspecification for the input-output relationship. Simulation results and analysis of real datasets are provided to support the mathematical argument.