A theoretical investigation of several model selection criteria for dimensionality reduction

  • Authors:
  • Shikui Tu;Lei Xu

  • Affiliations:
  • Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, PR China;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, PR China

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

Based on the problem of determining the hidden dimensionality (or the number of latent factors) of Factor Analysis (FA) model, this paper provides a theoretic comparison on several classical model selection criteria, including Akaike's Information Criterion (AIC), Bozdogan's Consistent Akaike's Information Criterion (CAIC), Hannan-Quinn information criterion (HQC), Schwarz's Bayesian Information Criterion (BIC). We focus on building up a partial order of the relative underestimation tendency. The order is shown to be AIC, HQC, BIC, and CAIC, indicating the underestimation probabilities from small to large. This order indicates an order of model selection performances to great extent, because underestimations usually take the major proportion of wrong selections when the sample size and the population signal-to-noise ratio (SNR, defined as the ratio of the smallest variance of the hidden dimensions to the variance of noise) decrease. Synthetic experiments by varying the values of the SNR and the training sample size N verify the theoretical results.