Improving short text classification using public search engines

  • Authors:
  • Wang Meng;Lin Lanfen;Wang Jing;Yu Penghua;Liu Jiaolong;Xie Fei

  • Affiliations:
  • College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China;College of Computer Science and Technology, Zhejiang University, HangZhou, China

  • Venue:
  • IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In Web2.0 applications, lots of the texts provided by users are as short as 3 to 10 words. A good classification against the short texts can help the readers find needed messages more quickly. In this paper, we proposed a method to expand the short texts with the help of public search engines through two steps. First we searched the short text in a public search engine and crawled the result pages. Secondly we regarded the texts in result pages as some background knowledge of the original short text, and extracted the feature vector from them. Therefore we can choose a proper number of the result pages to obtain enough corpuses for feature vector extraction to solve the data sparseness problem. We conducted some experiments under different situations and the empirical results indicated that this enriched representation of short texts can substantially improve the classification effects.