Searching microblogs: coping with sparsity and document quality

Authors:
Nasir Naveed;Thomas Gottron;Jérôme Kunegis;Arifah Che Alhadi
Affiliations:
University of Koblenz-Landau, Koblenz, Germany;University of Koblenz-Landau, Koblenz, Germany;University of Koblenz-Landau, Koblenz, Germany;University of Koblenz-Landau, Koblenz, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 11
Cited 8

Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
TwitterRank: finding topic-sensitive influential twitterers

Proceedings of the third ACM international conference on Web search and data mining
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
The Edinburgh Twitter corpus

WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Ranking Approaches for Microblog Search

WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
#TwitterSearch: a comparison of microblog search and web search

Proceedings of the fourth ACM international conference on Web search and data mining
Predicting popular messages in Twitter

Proceedings of the 20th international conference companion on World wide web
Design and implementation of relevance assessments using crowdsourcing

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Comparing twitter and traditional media using topic models

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Incorporating query expansion and quality indicators in searching microblog posts

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval

LiveTweet: monitoring and predicting interesting microblog posts

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Quality models for microblog retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Language processing for arabic microblog retrieval

Proceedings of the 21st ACM international conference on Information and knowledge management
Temporal models for microblogs

Proceedings of the 21st ACM international conference on Information and knowledge management
A summarization tool for time-sensitive social media

Proceedings of the 21st ACM international conference on Information and knowledge management
Pseudo test collections for training and tuning microblog rankers

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Improving LDA topic models for microblogs via tweet pooling and automatic labeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
TweetMogaz: a news portal of tweets

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two of the main challenges in retrieval on microblogs are the inherent sparsity of the documents and difficulties in assessing their quality. The feature sparsity is immanent to the restriction of the medium to short texts. Quality assessment is necessary as the microblog documents range from spam over trivia and personal chatter to news broadcasts, information dissemination and reports of current hot topics. In this paper we analyze how these challenges can influence standard retrieval models and propose methods to overcome the problems they pose. We consider the sparsity's effect on document length normalization and introduce "interestingness" as static quality measure. Our results show that deliberately ignoring length normalization yields better retrieval results in general and that interestingness improves retrieval for underspecified queries.