Text classification based on partial least square analysis

Authors:
Xue-Qiang Zeng;Ming-Wen Wang;Jian-Yun Nie
Affiliations:
Shanghai University, Shanghai, China and Nanchang University, Nanchang, China;Jiangxi Normal University, Nanchang, China;DIRO, Université de Montréal, Montreal Quebec, Canada
Venue:
Proceedings of the 2007 ACM symposium on Applied computing
Year:
2007

Citing 7
Cited 4

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Improving text retrieval for the routing problem using latent semantic indexing

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Improving Text Classification using Local Latent Semantic Indexing

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

Irrelevant gene elimination for Partial Least Squares based Dimension Reduction by using feature probes

International Journal of Data Mining and Bioinformatics
Orthogonal projection weights in dimension reduction based on Partial Least Squares

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Pairwise-adaptive dissimilarity measure for document clustering

Information Sciences: an International Journal
On the number of partial least squares components in dimension reduction for tumor classification

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Semantic Indexing (LSI) is a favorite feature extraction method used in text classification. Since when important global features for all the classes can be determined by LSI, important local features for small classes may be ignored, this leads to poor performance on these small classes. To solve this problem, a novel method based on Partial Least Square (PLS) analysis is proposed by integrating class information into the latent classification structure. Important features are extracted according to both their descriptive power of document contents as in LSI, and their capacity of discriminating classes. The extracted features are applied to several classification algorithms: SVM, kNN, C4.5 and SMO. Experiments on Reuters prove that the features extracted by our method outperform those extracted by LSI in all the cases. In particular, the gain obtained by our method is the most apparent on small classes.