Fast dimension reduction for document classification based on imprecise spectrum analysis

  • Authors:
  • Hu Guan;Bin Xiao;Jingyu Zhou;Minyi Guo;Tao Yang

  • Affiliations:
  • Shanghai Jiao Tong Univ., Shanghai, China;HK Polytechnic Univ., Hong Kong, Hong Kong;Shanghai Jiao Tong Univ., Shanghai, China;Shanghai Jiao Tong Univ., Shanghai, China;University of California at Santa Barbara, California, USA

  • Venue:
  • CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an algorithm called Imprecise Spectrum Analysis (ISA) to carry out fast dimension reduction for document classification. ISA is designed based on the one-sided Jacobi method for Singular Value Decomposition (SVD). To speedup dimension reduction, it simplifies the orthogonalization process of Jacobi computation and introduces a new mapping formula for transforming original document-term vectors. To improve classification accuracy using ISA, a feature selection method is further developed to make inter-class feature vectors more orthogonal in building the initial weighted term-document matrix. Our experimental results show that ISA is extremely fast in handling large term-document matrices and delivers better or competitive classification accuracy compared to SVD-based LSI.