Efficient kernel feature extraction for massive data sets

  • Authors:
  • Ivor W. Tsang;Andras Kocsor;James T. Kwok

  • Affiliations:
  • Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong;University of Szeged, Hungary;Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Maximum margin discriminant analysis (MMDA) was proposed that uses the margin idea for feature extraction. It often outperforms traditional methods like kernel principal component analysis (KPCA) and kernel Fisher discriminant analysis (KFD). However, as in other kernel methods, its time complexity is cubic in the number of training points m, and is thus computationally inefficient on massive data sets. In this paper, we propose an (1+ε)2-approximation algorithm for obtaining the MMDA features by extending the core vector machines. The resultant time complexity is only linear in m, while its space complexity is independent of m. Extensive comparisons with the original MMDA, KPCA, and KFD on a number of large data sets show that the proposed feature extractor can improve classification accuracy, and is also faster than these kernel-based methods by more than an order of magnitude.