The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

Authors:
P. L. Bartlett
Affiliations:
Dept. of Syst. Eng., Australian Nat. Univ., Canberra, ACT
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 119

Improved boosting algorithms using confidence-rated predictions

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Further results on the margin distribution

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Using Decision Trees to Construct a Practical Parser

Machine Learning - Special issue on natural language learning
Parameter convergence and learning curves for neural networks

Neural Computation
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Improved Generalization Through Explicit Optimization of Margins

Machine Learning
On the VC Dimension of Bounded Margin Classifiers

Machine Learning
A re-weighting strategy for improving margins

Artificial Intelligence
Model complexity control and statisticallearning theory

Natural Computing: an international journal
On the Dual Formulation of Regularized Linear Systems with Convex Risks

Machine Learning
Model Selection and Error Estimation

Machine Learning
Learning-Based Complexity Evaluation of Radial Basis Function Networks

Neural Processing Letters
Generalization Ability of Folding Networks

IEEE Transactions on Knowledge and Data Engineering
Mathematical Modelling of Generalization

WIRN VIETRI 2002 Proceedings of the 13th Italian Workshop on Neural Nets-Revised Papers
Kernel Based Image Classification

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Large Margin Nearest Neighbor Classifiers

IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Connectionist Models of Neurons, Learning Processes and Artificial Intelligence-Part I
Theoretical Views of Boosting

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Theoretical Views of Boosting and Applications

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
A Note on the Generalization Performance of Kernel Classifiers with Margin

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
A Generalized Class of Boosting Algorithms Based on Recursive Decoding Models

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Agnostic Boosting

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
On Agnostic Learning with {0, *, 1}-Valued and Real-Valued Hypotheses

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Further Explanation of the Effectiveness of Voting Methods: The Game between Margins and Weights

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Data-Dependent Margin-Based Generalization Bounds for Classification

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Bounds on the Generalization Ability of Bayesian Inference and Gibbs Algorithms

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Generalization Performance of Classifiers in Terms of Observed Covering Numbers

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Reducing multiclass to binary: a unifying approach for margin classifiers

The Journal of Machine Learning Research
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
Data-dependent margin-based generalization bounds for classification

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Generalization Error Bounds for Threshold Decision Lists

The Journal of Machine Learning Research
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Support Vector Machine Soft Margin Classifiers: Error Analysis

The Journal of Machine Learning Research
A Fixed-Distribution PAC Learning Theory for Neural FIR Models

Journal of Intelligent Information Systems
SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

Neural Computation
Robust Formulations for Training Multilayer Perceptrons

Neural Computation
Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks

Neural Computation
Classification-based objective functions

Machine Learning
Parameter estimation for statistical parsing models: theory and practice of distribution-free methods

New developments in parsing technology
New Support Vector Algorithms

Neural Computation
CB3: An Adaptive Error Function for Backpropagation Training

Neural Processing Letters
Terminated Ramp-Support Vector Machines: A nonparametric data dependent kernel

Neural Networks
On the generalization error of fixed combinations of classifiers

Journal of Computer and System Sciences
Backward elimination model construction for regression and classification using leave-one-out criteria

International Journal of Systems Science
Estimates of covering numbers of convex sets with slowly decaying orthogonal subsets

Discrete Applied Mathematics
Aspects of discrete mathematics and probability in the theory of machine learning

Discrete Applied Mathematics
Relation between weight size and degree of over-fitting in neural network regression

Neural Networks
VC Theory of Large Margin Multi-Category Classifiers

The Journal of Machine Learning Research
Variations of the two-spiral task

Connection Science
Aggregation of SVM Classifiers Using Sobolev Spaces

The Journal of Machine Learning Research
Learning rates for regularized classifiers using multivariate polynomial kernels

Journal of Complexity
Exploring Margin Maximization for Biometric Score Fusion

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Neural Network with Matrix Inputs

Informatica
Modeling of Thin Film Process Data Using a Genetic Algorithm-Optimized Initial Weight of Backpropagation Neural Network

Applied Artificial Intelligence
γ-C plane and robustness in static reservoir for nonlinear regression estimation

Neurocomputing
Small Number of Hidden Units for ELM with Two-Stage Linear Model

IEICE - Transactions on Information and Systems
Generalization performance of ν-support vector classifier based on conditional value-at-risk minimization

Neurocomputing
Comparison of nonlinear methods for hematocrit estimation from the transduced anodic current curve

MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Neural networks and multimedia datasets: estimating the size of neural networks for achieving high classification accuracy

MUSP'09 Proceedings of the 9th WSEAS international conference on Multimedia systems & signal processing
Towards a Linear Combination of Dichotomizers by Margin Maximization

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Performance prediction for exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A model of inductive bias learning

Journal of Artificial Intelligence Research
A framework for kernel-based multi-category classification

Journal of Artificial Intelligence Research
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Designing neural networks for tackling hard classification problems

WSEAS TRANSACTIONS on SYSTEMS
A simple additive re-weighting strategy for improving margins

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Maximal width learning of binary functions

Theoretical Computer Science
Weight-decay regularization in reproducing Kernel Hilbert spaces by variable-basis schemes

WSEAS Transactions on Mathematics
A new constructive algorithm for architectural and functional adaptation of artificial neural networks

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Applications of multi-objective structure optimization

Neurocomputing
Rapid and brief communication: Evolutionary extreme learning machine

Pattern Recognition
Margin-based Ranking and an Equivalence between AdaBoost and RankBoost

The Journal of Machine Learning Research
Estimating the size of neural networks from the number of available training data

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Online training for single hidden-layer Online training for single hidden-layer

CIRA'09 Proceedings of the 8th IEEE international conference on Computational intelligence in robotics and automation
Large margin cost-sensitive learning of conditional random fields

Pattern Recognition
A supervised combination strategy for illumination chromaticity estimation

ACM Transactions on Applied Perception (TAP)
Optimization method based extreme learning machine for classification

Neurocomputing
Estimates on weight-decay regularization by variable-basis schemes

ACS'09 Proceedings of the 9th WSEAS international conference on Applied computer science
A Note on a priori Estimations of Classification Circuit Complexity

Fundamenta Informaticae - Hardest Boolean Functions and O.B. Lupanov
Logistic classification with varying Gaussians

Computers & Mathematics with Applications
Least square regression with lp-coefficient regularization

Neural Computation
Sequence classification via large margin hidden Markov models

Data Mining and Knowledge Discovery
Composite Function Wavelet Neural Networks with Differential Evolution and Extreme Learning Machine

Neural Processing Letters
Dynamic construction of multilayer neural networks for classification

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part I
Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks

Pattern Recognition Letters
Voting based extreme learning machine

Information Sciences: an International Journal
Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

International Journal of Applied Mathematics and Computer Science
Robust cutpoints in the logical analysis of numerical data

Discrete Applied Mathematics
A novel learning algorithm for feedforward neural networks

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
A fast learning algorithm based on layered hessian approximations and the pseudoinverse

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Nature inspiration for support vector machines

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
An improved extreme learning machine based on particle swarm optimization

ICIC'11 Proceedings of the 7th international conference on Intelligent Computing: bio-inspired computing and applications
An Epicurean learning approach to gene-expression data classification

Artificial Intelligence in Medicine
Book review

Automatica (Journal of IFAC)
A comparison of complexity selection approaches for polynomials based on: vapnik-chervonenkis dimension, rademacher complexity and covering numbers

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Diversity regularized machine

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
A framework for automatic TRIZ level of invention estimation of patents using natural language processing, knowledge-transfer and patent citation metrics

Computer-Aided Design
Learning Rates for Regularized Classifiers Using Trigonometric Polynomial Kernels

Neural Processing Letters
Analysis of a multi-category classifier

Discrete Applied Mathematics
Musical pitch estimation using a supervised single hidden layer feed-forward neural network

Expert Systems with Applications: An International Journal
Sales forecasting for computer wholesalers: A comparison of multivariate adaptive regression splines and artificial neural networks

Decision Support Systems
Robust 3d action recognition with random occupancy patterns

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Diversity regularized ensemble pruning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Matrix pseudoinversion for image neural processing

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Cross-validation of bimodal health-related stress assessment

Personal and Ubiquitous Computing
Generalized classifier neural network

Neural Networks
Full length article: Approximation by multivariate Bernstein-Durrmeyer operators and learning rates of least-squares regularized regression with multivariate polynomial kernels

Journal of Approximation Theory
Semi-supervised learning of hidden conditional random fields for time-series classification

Neurocomputing
Multilayer perceptron for the learning of spatio-temporal dynamics-application in thermal engineering

Engineering Applications of Artificial Intelligence
Eigenvalue decay: A new method for neural network regularization

Neurocomputing
A hybrid approach combining extreme learning machine and sparse representation for image classification

Engineering Applications of Artificial Intelligence
Predicting minority class for suspended particulate matters level by extreme learning machine

Neurocomputing
2-D defect profile reconstruction from ultrasonic guided wave signals based on QGA-kernelized ELM

Neurocomputing
Genetic ensemble of extreme learning machine

Neurocomputing
Generalization Bounds of Regularization Algorithm with Gaussian Kernels

Neural Processing Letters
Learning bounds via sample width for classifiers on finite metric spaces

Theoretical Computer Science

Quantified Score

Hi-index	754.84

Visualization

Abstract

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples