Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Context-sensitive semantic smoothing for the language modeling approach to genomic IR
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Query enrichment for web-query classification
ACM Transactions on Information Systems (TOIS)
Altering document term vectors for classification: ontologies as expectations of co-occurrence
Proceedings of the 16th international conference on World Wide Web
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Hi-index | 0.00 |
Document classification provides an effective way to handle the explosive online textual data. However, in practical classification settings, we face the so-called feature sparsity problem caused by a lack of training documents or the shortness of text to be classified. In this paper, we solve the sparsity problem by exploiting term relationships along with Naive Bayes classifiers. The first method is to estimate term relationships based on the co-occurrence information of two terms in a certain context. The second method estimates the term relationships based on the distribution of terms over different hierarchical categories in a publicly available document taxonomy. Thereafter, term relationship is used to augment Naive Bayes classifiers. We test our methods on two open-domain data sets to demonstrate its advantages. The experimental results show that our method can significantly improve the classification performance, especially when we do not have enough training data or the texts are Web search queries.