Context-Based Term Frequency Assessment for Text Classification

Authors:
Rey-Long Liu
Affiliations:
Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, R.O.C.
Venue:
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Year:
2008

Citing 10
Cited 1

Information extraction as a basis for high-precision text classification

ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical

Advances in kernel methods
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Biterm language models for document retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Phrases in Automated Text Categorization

Statistical Phrases in Automated Text Categorization
Fast and accurate text classification via multiple linear discriminant projections

The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic category profiling for text filtering and classification

Information Processing and Management: an International Journal
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Is the contextual information relevant in text clustering by compression?

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic text classification (TC) is a fundamental component for information processing and management. To properly classify a document d , it is essential to identify semantics of each term t in d , while the semantics heavily depends on contexts (neighboring terms) of t in d . In this paper, we present a technique CTFA (C ontext-based T erm F requency A ssessment) that improves text classifiers by considering term contexts in test documents. Results of the term context recognition are used to re-assess term frequencies, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Experimental Results show that CTFA may successfully enhance performances of Rocchio and SVM (Support Vector Machine) classifiers on Reuters and Newsgroups data.