A study of semi-discrete matrix decomposition for LSI in automated text categorization

Authors:
Wang Qiang;Wang XiaoLong;Guan Yi
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, Harbin;School of Computer Science and Technology, Harbin Institute of Technology, Harbin;School of Computer Science and Technology, Harbin Institute of Technology, Harbin
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 2
Cited 1

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Algorithm 805: computation and uses of the semidiscrete matrix decomposition

ACM Transactions on Mathematical Software (TOMS)

Text classification: a recent overview

ICCOMP'05 Proceedings of the 9th WSEAS International Conference on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes the use of Latent Semantic Indexing (LSI) techniques, decomposed with semi-discrete matrix decomposition (SDD) method, for text categorization. The SDD algorithm is a recent solution to LSI, which can achieve similar performance at a much lower storage cost. In this paper, LSI is used for text categorization by constructing new features of category as combinations or transformations of the original features. In the experiments on data set of Chinese Library Classification we compare accuracy to a classifier based on k-Nearest Neighbor (k-NN) and the result shows that k-NN based on LSI is sometimes significantly better. Much future work remains, but the results indicate that LSI is a promising technique for text categorization.