Theme word subspace method for text document categorization

Authors:
Zhou Xiaofei;Guo Li;Tan Jianlong;Jiang Wenhan
Affiliations:
Institute of Information, Engineering Chinese, Academy of Sciences, Beijing, China;Institute of Information, Engineering Chinese, Academy of Sciences, Beijing, China;Institute of Information, Engineering Chinese, Academy of Sciences, Beijing, China;First Research Institute of Ministry of Public Security, Beijing, China
Venue:
DM-IKM '12 Proceedings of the Data Mining and Intelligent Knowledge Management Workshop
Year:
2012

Citing 15
Cited 0

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Discriminant Waveletfaces and Nearest Feature Classifiers for Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Detecting Concept Drift with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Classification of TV Sports News by DCT Features Using Multiple Subspace Method

ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
A novel refinement approach for text categorization

Proceedings of the 14th ACM international conference on Information and knowledge management
Text categorization via generalized discriminant analysis

Information Processing and Management: an International Journal
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
A New Kernel-Based Classification Algorithm

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Credit risk evaluation with kernel-based affine subspace nearest points learning method

Expert Systems with Applications: An International Journal
Subspace Distance-Based Sampling Method for SVM

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Multinomial naive bayes for text categorization revisited

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a text document categorization method called Theme Word Subspace (TWS) learning is presented, which utilizes theme words jointly express class-semantic information for document classification. In a class corpus, the theme words with high probability distribution in topic structure are extracted firstly, and then these words as important theme element span class subspaces to jointly represent semantic and distribution of the class. For document categorization processing, a text document is belonged to the nearest subspace whose theme words have the best representation for test document. In our TWS, L1, L2 norm are separately used for measuring the distances of a test document to subspaces. Experiments on a large Chinese text corpus, the proposed TWS learning methods exhibit comparable performances for text document category.