A scalable supervised algorithm for dimensionality reduction on streaming data

  • Authors:
  • Jun Yan;Benyu Zhang;Shuicheng Yan;Ning Liu;Qiang Yang;Qiansheng Cheng;Hua Li;Zheng Chen;Wei-Ying Ma

  • Affiliations:
  • LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, PR China;Microsoft Research Asia, 49, Zhichun Road, Beijing 100080, PR China;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, PR China;Department of Mathematics, Tsinghua University, Beijing 100084, PR China;Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, PR China;Department of Mathematics, School of Mathematical Science, Peking University, Beijing 100871, PR China;Microsoft Research Asia, 49, Zhichun Road, Beijing 100080, PR China;Microsoft Research Asia, 49, Zhichun Road, Beijing 100080, PR China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.07

Visualization

Abstract

Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data.