Efficient agnostic learning of neural networks with bounded fan-in

Authors:
Wee Sun Lee;P. L. Bartlett;R. C. Williamson
Affiliations:
Dept. of Syst. Eng., Australian Nat. Univ., Canberra, ACT;-;-
Venue:
IEEE Transactions on Information Theory - Part 2
Year:
2006

Citing 0
Cited 27

Estimation of Time-Varying Parameters in Statistical Models: AnOptimization Approach

Machine Learning - Special issue: computational learning theory, COLT '97
Nonparametric Time Series Prediction Through Adaptive ModelSelection

Machine Learning
Model complexity control and statisticallearning theory

Natural Computing: an international journal
Hardness results for neural network approximation problems

Theoretical Computer Science
Hardness Results for Neural Network Approximation Problems

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Agnostic Learning Nonconvex Function Classes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Localized Rademacher Complexities

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
Comparing Bayes model averaging and stacking when model approximation error cannot be ignored

The Journal of Machine Learning Research
Motion Estimation Using Statistical Learning Theory

IEEE Transactions on Pattern Analysis and Machine Intelligence
Agnostically Learning Halfspaces

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks

Neural Computation
Nonlinear function approximation: Computing smooth solutions with an adaptive greedy algorithm

Journal of Approximation Theory
Multi-kernel regularized classifiers

Journal of Complexity
Density estimation with stagewise optimization of the empirical risk

Machine Learning
Vapnik-chervonenkis generalization bounds for real valued neural networks

Neural Computation
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Convergence analysis of convex incremental neural networks

Annals of Mathematics and Artificial Intelligence
Systemical convergence rate analysis of convex incremental feedforward neural networks

Neurocomputing
Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

Neurocomputing
Estimation of a regression function by maxima of minima of linear functions

IEEE Transactions on Information Theory
Local minimax learning of functions with best finite sample estimation error bounds: applications to ridge and lasso regression, boosting, tree learning, kernel machines, and inverse problems

IEEE Transactions on Information Theory
Large-margin classification in infinite neural networks

Neural Computation
Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints

SIAM Journal on Optimization
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Approximation and estimation bounds for free knot splines

Computers & Mathematics with Applications
Variable selection in high-dimension with random designs and orthogonal matching pursuit

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.12

Visualization

Abstract

We show that the class of two-layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the probably approximately correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning real-valued functions with bounded noise, learning probabilistic concepts, and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have real-valued inputs and outputs, an unlimited number of threshold hidden units with bounded fan-in, and a bound on the sum of the absolute values of the output weights. The number of computation steps of the learning algorithm is bounded by a polynomial in 1/ε, 1/δ, n and B where ε is the desired accuracy, δ is the probability that the algorithm fails, n is the input dimension, and B is the bound on both the absolute value of the target (which may be a random variable) and the sum of the absolute values of the output weights. In obtaining the result, we also extended some results on iterative approximation of functions in the closure of the convex hull of a function class and on the sample complexity of agnostic learning with the quadratic loss function