A Comparative Study on Data Smoothing Regularization for Local Factor Analysis

Authors:
Shikui Tu;Lei Shi;Lei Xu
Affiliations:
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, P.R. China;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, P.R. China;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, P.R. China
Venue:
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Year:
2008

Citing 5
Cited 0

Training with noise is equivalent to Tikhonov regularization

Neural Computation
Mixtures of probabilistic principal component analyzers

Neural Computation
Data smoothing regularization, multi-sets-learning, and problem solving strategies

Neural Networks - 2003 Special issue: Advances in neural networks research — IJCNN'03
A unified perspective and new results on RHT computing, mixture based learning, and multi-learner based problem solving

Pattern Recognition
Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family

WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Selecting the cluster number and the hidden factor numbers of Local Factor Analysis (LFA) model is a typical model selection problem, which is difficult when the sample size is finite or small. Data smoothing is one of the three regularization techniques integrated in the statistical learning framework, Bayesian Ying-Yang (BYY) harmony learning theory, to improve parameter learning and model selection. In this paper, we will comparatively investigate the performance of five existing formulas to determine the hyper-parameter namely the smoothing parameter that controls the strength of data smoothing regularization. BYY learning algorithms on LFA using these formulas are evaluated by model selection accuracy on simulated data and classification accuracy on real world data. Two observations are obtained. First, learning with data smoothing works better than that without it especially when sample size is small. Second, the gradient method derived from imposing a sample set based improper prior on the smoothing parameter generally outperforms other methods such as the one from Gamma or Chi-square prior, and the one under the equal covariance principle.