Searching microblogs: coping with sparsity and document quality

  • Authors:
  • Nasir Naveed;Thomas Gottron;Jérôme Kunegis;Arifah Che Alhadi

  • Affiliations:
  • University of Koblenz-Landau, Koblenz, Germany;University of Koblenz-Landau, Koblenz, Germany;University of Koblenz-Landau, Koblenz, Germany;University of Koblenz-Landau, Koblenz, Germany

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two of the main challenges in retrieval on microblogs are the inherent sparsity of the documents and difficulties in assessing their quality. The feature sparsity is immanent to the restriction of the medium to short texts. Quality assessment is necessary as the microblog documents range from spam over trivia and personal chatter to news broadcasts, information dissemination and reports of current hot topics. In this paper we analyze how these challenges can influence standard retrieval models and propose methods to overcome the problems they pose. We consider the sparsity's effect on document length normalization and introduce "interestingness" as static quality measure. Our results show that deliberately ignoring length normalization yields better retrieval results in general and that interestingness improves retrieval for underspecified queries.