Regularization Learning and Early Stopping in Linear Networks

  • Authors:
  • Affiliations:
  • Venue:
  • IJCNN '00 Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN'00)-Volume 4 - Volume 4
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Generally, learning is performed to minimize the sum of squared errors between network outputs and training data. Unfortunately, this procedure does not necessarily give us a network with good generalization ability when the number of connection weights is relatively large. In such situation, overfitting to the training data occurs. To overcome this problem, there are several approaches such as regularization learning [6][11][12][16] and early stopping [2][15]. It has been suggested that these two methods are closely related [4][5][8][14]. In this article, we firstly give a unified interpretation for the relationship between two methods through the analysis of linear networks in the context of statistical regression; i.e. linear regression model. On the other hand, several theoretical works have been done on the optimal regularization parameter [6][11][12][16] and the optimal stopping time [2][15]. Here, we also consider the problem from the unified viewpoint mentioned above. This analysis enables us to understand the statistical meaning of the optimality. Then, the estimates of the optimal regularization parameter and the optimal stopping time are present and simple numerical simulations examine those. Moreover, for the choice of regularization parameter, the relationship between the Bayesian framework and the generalization error minimization framework is discussed.