Short text classification using very few words

  • Authors:
  • Aixin Sun

  • Affiliations:
  • Nanyang Technological University, Singapore, Singapore

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a simple, scalable, and non-parametric approach for short text classification. Leveraging the well studied and scalable Information Retrieval (IR) framework, our approach mimics human labeling process for a piece of short text. It first selects the most representative and topical-indicative words from a given short text as query words, and then searches for a small set of labeled short texts best matching the query words. The predicted category label is the majority vote of the search results. Evaluated on a collection of more than 12K Web snippets, the proposed approach achieves comparable classification accuracy with the baseline Maximum Entropy classifier using as few as 3 query words and top-5 best matching search hits. Among the four query word selection schemes proposed and evaluated in our experiments, term frequency together with clarity gives the best classification accuracy.