Model selection
A universal theorem on learning curves
Neural Networks
Journal of Mathematical Psychology
Algebraic geometrical methods for hierarchical learning machines
Neural Networks
Asymptotic Model Selection for Naive Bayesian Networks
The Journal of Machine Learning Research
Algebraic Analysis for Nonidentifiable Learning Machines
Neural Computation
Algebraic Geometry and Statistical Learning Theory
Algebraic Geometry and Statistical Learning Theory
Equations of states in singular statistical estimation
Neural Networks
The Journal of Machine Learning Research
A widely applicable Bayesian information criterion
The Journal of Machine Learning Research
Effective connectivity analysis of fMRI data based on network motifs
The Journal of Supercomputing
Hi-index | 0.00 |
In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the present paper, we theoretically compare the Bayes cross-validation loss and the widely applicable information criterion and prove two theorems. First, the Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable. Therefore, model selection and hyperparameter optimization using these two values are asymptotically equivalent. Second, the sum of the Bayes generalization error and the Bayes cross-validation error is asymptotically equal to 2λ/n, where λ is the real log canonical threshold and n is the number of training samples. Therefore the relation between the cross-validation error and the generalization error is determined by the algebraic geometrical structure of a learning machine. We also clarify that the deviance information criteria are different from the Bayes cross-validation and the widely applicable information criterion.