From data distributions to regularization in invariant learning

Authors:
Todd K. Leen
Affiliations:
-
Venue:
Neural Computation
Year:
1995

Citing 0
Cited 8

Noise injection: theoretical prospects

Neural Computation
On different facets of regularization theory

Neural Computation
Linear Constraints on Weight Representation for Generalized Learning of Multilayer Networks

Neural Computation
Inference with the Universum

ICML '06 Proceedings of the 23rd international conference on Machine learning
Invariances in kernel methods: From samples to objects

Pattern Recognition Letters
Invariant kernel functions for pattern analysis and machine learning

Machine Learning
Incorporating prior knowledge in support vector machines for classification: A review

Neurocomputing
A smoothing regularizer for feedforward and recurrent neural networks

Neural Computation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Ideally pattern recognition machines provide constant outputwhen the inputs are transformed under a group G of desiredinvariances. These invariances can be achieved by enhancing thetraining data to include examples of inputs transformed by elementsof G, while leaving the corresponding targets unchanged.Alternatively the cost function for training can include aregularization term that penalizes changes in the output when theinput is transformed under the group. This paper relates the twoapproaches, showing precisely the sense in which the regularizedcost function approximates the result of adding transformedexamples to the training data. We introduce the notion of aprobability distribution over the group transformations, and usethis to rewrite the cost function for the enhanced training data.Under certain conditions, the new cost function is equivalent tothe sum of the original cost function plus a regularizer. Forunbiased models, the regularizer reduces to the intuitively obviouschoice---a term that penalizes changes in the output when theinputs are transformed under the group. For infinitesimaltransformations, the coefficient of the regularization term reducesto the variance of the distortions introduced into the trainingdata. This correspondence provides a simple bridge between the twoapproaches.