Context-Based Term Frequency Assessment for Text Classification

  • Authors:
  • Rey-Long Liu

  • Affiliations:
  • Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, R.O.C.

  • Venue:
  • PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic text classification (TC) is a fundamental component for information processing and management. To properly classify a document d , it is essential to identify semantics of each term t in d , while the semantics heavily depends on contexts (neighboring terms) of t in d . In this paper, we present a technique CTFA (C ontext-based T erm F requency A ssessment) that improves text classifiers by considering term contexts in test documents. Results of the term context recognition are used to re-assess term frequencies, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Experimental Results show that CTFA may successfully enhance performances of Rocchio and SVM (Support Vector Machine) classifiers on Reuters and Newsgroups data.