Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Improving Text Classification using Local Latent Semantic Indexing
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
International Journal of Data Mining and Bioinformatics
Orthogonal projection weights in dimension reduction based on Partial Least Squares
International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Pairwise-adaptive dissimilarity measure for document clustering
Information Sciences: an International Journal
On the number of partial least squares components in dimension reduction for tumor classification
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Hi-index | 0.00 |
Latent Semantic Indexing (LSI) is a favorite feature extraction method used in text classification. Since when important global features for all the classes can be determined by LSI, important local features for small classes may be ignored, this leads to poor performance on these small classes. To solve this problem, a novel method based on Partial Least Square (PLS) analysis is proposed by integrating class information into the latent classification structure. Important features are extracted according to both their descriptive power of document contents as in LSI, and their capacity of discriminating classes. The extracted features are applied to several classification algorithms: SVM, kNN, C4.5 and SMO. Experiments on Reuters prove that the features extracted by our method outperform those extracted by LSI in all the cases. In particular, the gain obtained by our method is the most apparent on small classes.