Short text classification by detecting information path

  • Authors:
  • Shitao Zhang;Xiaoming Jin;Dou Shen;Bin Cao;Xuetao Ding;Xiaochen Zhang

  • Affiliations:
  • School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China;Baidu Corporation, Beijing, China;Microsoft Research Asia, Beijing, China;School of Software, Tsinghua University, Beijing, China;School of Software, Tsinghua University, Beijing, China

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Short text is becoming ubiquitous in many modern information systems. Due to the shortness and sparseness of short texts, there are less informative word co-occurrences among them, which naturally pose great difficulty for classification tasks on such data. To overcome this difficulty, this paper proposes a new way for effectively classifying the short texts. Our method is based on a key observation that there usually exists ordered subsets in short texts, which is termed ``information path'' in this work, and classification on each subset based on the classification results of some pervious subsets can yield higher overall accuracy than classifying the entire data set directly. We propose a method to detect the information path and employ it in short text classification. Different from the state-of-art methods, our method does not require any external knowledge or corpus that usually need careful fine-tuning, which makes our method easier and more robust on different data sets. Experiments on two real world data sets show the effectiveness of the proposed method and its superiority over the existing methods.