Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Chinese Documents Classification Based on N-Grams
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Multi-dimensional text classification
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
In Text Categorization (TC) based on Vector Space Model, feature weighting and feature selection are major problems and difficulties. This paper proposes two methods of weighting features by combining the relevant influential factors together. A TC system for Chinese texts is designed in terms of character bigrams as features. Experiments on a document collection of 71,674 texts show that the F1 metric of categorization performance of the system is 85.9%, which is about 5% higher than that of the well-known TF*IDF weighting scheme. Moreover, a multi-step feature selection process is exploited to reduce the dimension of the feature space effectively in the system.