Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Towards language independent automated learning of text categorization models
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improved boosting algorithms using confidence-rated predictions
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Chinese Documents Classification Based on N-Grams
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
Text classification is becoming more and more important with the rapid growth of on-line information available. It was observed that the quality of training corpus impacts the performance of the trained classifier. This paper proposes an approach to build high-quality training corpuses for better classification performance by first exploring the properties of training corpuses, and then giving an algorithm for constructing training corpuses semi-automatically. Preliminary experimental results validate our approach: classifiers based on the training corpuses constructed by our approach can achieve good performance while the training corpus' size is significantly reduced. Our approach can be used for building efficient and lightweight classification systems.