Rule-based word clustering for text classification

  • Authors:
  • Hui Han;Eren Manavoglu;C. Lee Giles;Hongyuan Zha

  • Affiliations:
  • The Pennsylvania State University University Park, PA;The Pennsylvania State University University Park, PA;The Pennsylvania State University University Park, PA;The Pennsylvania State University University Park, PA

  • Venue:
  • Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8 the overall accuracy of extracting bibliographic fields from references, and by 18.32 on average the class-specific performance on the line classification of document headers.