Improving retrieval of short texts through document expansion

  • Authors:
  • Miles Efron;Peter Organisciak;Katrina Fenlon

  • Affiliations:
  • University of Illinois, Urbana-Champaign, Champaign, IL, USA;University of Illinois, Urbana-Champaign, Champaign, IL, USA;University of Illinois, Urbana-Champaign, Champaign, IL, USA

  • Venue:
  • SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.