Information extraction as a basis for high-precision text classification
ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Biterm language models for document retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Phrases in Automated Text Categorization
Statistical Phrases in Automated Text Categorization
Fast and accurate text classification via multiple linear discriminant projections
The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic category profiling for text filtering and classification
Information Processing and Management: an International Journal
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Is the contextual information relevant in text clustering by compression?
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Automatic text classification (TC) is a fundamental component for information processing and management. To properly classify a document d , it is essential to identify semantics of each term t in d , while the semantics heavily depends on contexts (neighboring terms) of t in d . In this paper, we present a technique CTFA (C ontext-based T erm F requency A ssessment) that improves text classifiers by considering term contexts in test documents. Results of the term context recognition are used to re-assess term frequencies, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Experimental Results show that CTFA may successfully enhance performances of Rocchio and SVM (Support Vector Machine) classifiers on Reuters and Newsgroups data.