Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Supervised latent semantic indexing using adaptive sprinkling
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
Latent Semantic Indexing (LSI) has been successfully applied to information retrieval and text classification. However, when LSI is used in classification, some important features for small classes may be ignored because of their small feature values. To solve this problem, we propose the latent semantic classification (LSC) model which extends the LSI model in the following way: the classification information of the training documents is introduced into the latent semantic structure via a second set of latent variables, so that both indexing and classification information can be taken into account during the classification process. Our experiments on Reuters show that our new model performs better than the existing classification methods such as kNN and SVM.