Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic word clustering for text categorization using global information
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Hi-index | 0.00 |
In this paper, we mainly study and propose an approach to improve document classification using domain knowledge. First we introduce a domain knowledge dictionary NEUKD, and propose two models which use domain knowledge as textual features for text categorization. The first one is BOTW model which uses domain associated terms and conventional words as textual features. The other one is BOF model which uses domain features as textual features. But due to limitation of size of domain knowledge dictionary, we study and use a machine learning technique to solve the problem, and propose a BOL model which could be considered as the extended version of BOF model. In the comparison experiments, we consider naïve Bayes system based on BOW model as baseline system. Comparison experimental results of naïve Bayes systems based on those four models (BOW, BOTW, BOF and BOL) show that domain knowledge is very useful for improving text categorization. BOTW model performs better than BOW model, and BOL and BOF models perform better than BOW model in small number of features cases. Through learning new features using machine learning technique, BOL model performs better than BOF model.