Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words

  • Authors:
  • Chikara Hashimoto;Sadao Kurohashi

  • Affiliations:
  • Yamagata University, Yonezawa-shi, Yamagata, Japan;Kyoto University, Sakyo-ku, Kyoto, Japan

  • Venue:
  • HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an approach to text categorization that i) uses no machine learning and ii) reacts on-the-fly to unknown words. These features are important for categorizing Blog articles, which are updated on a daily basis and filled with newly coined words. We categorize 600 Blog articles into 12 domains. As a result, our categorization method achieved an accuracy of 94.0% (564/600).