An incremental subspace learning algorithm to categorize large scale text data

  • Authors:
  • Jun Yan;Qiansheng Cheng;Qiang Yang;Benyu Zhang

  • Affiliations:
  • LMAM, Department of Information Science, School of Mathematical Sciences, Peking University, Beijing, P.R. China;LMAM, Department of Information Science, School of Mathematical Sciences, Peking University, Beijing, P.R. China;Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong;Microsoft Research Asia, Beijing, P.R. China

  • Venue:
  • APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The dramatic growth in the number and size of on-line information sources has fueled increasing research interest in the incremental subspace learning problem. In this paper, we propose an incremental supervised subspace learning algorithm, called Incremental Inter-class Scatter (IIS) algorithm. Unlike traditional batch learners, IIS learns from a stream of training data, not a set. IIS overcomes the inherent problem of some other incremental operations such as Incremental Principal Component Analysis (PCA) and Incremental Linear Discriminant Analysis (LDA). The experimental results on the synthetic datasets show that IIS performs as well as LDA and is more robust against noise. In addition, the experiments on the Reuters Corpus Volume 1 (RCV1) dataset show that IIS outperforms state-of-the-art Incremental Principal Component Analysis (IPCA) algorithm, a related algorithm, and Information Gain in efficiency and effectiveness respectively.