Text categorization using hybrid (mined) terms (poster session)

  • Authors:
  • C. K. P. Wong;R. W. P. Luk;K. F. Wong;K. L. Kwok

  • Affiliations:
  • Chinese University of Hong Kong, Dept. of Systems Eng. And Eng., Management, Shartin, Hong Kong;-;-;Queens' College, CUNY, Dept. Computer Science, New York

  • Venue:
  • IRAL '00 Proceedings of the fifth international workshop on on Information retrieval with Asian languages
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper evaluated text categorization using charactes, bigrams, words and hybrid terms. These terms were also augmented with mined terms. Classifiers using hybrid terms did not achieve better classification performance. The use of data mining techniques to add new terms to the dictionary improves the performance of character-based classifiers. Our naïve comparison between the Pat-tree classifier and our best classifier shows that the Pat-tree classifier has the best precision (77%) and our best classifier has the best recall (72%) and the lowest storage requirement (13%).