Statistical Translation Language Model for Twitter Search

Authors:
Maryam Karimzadehgan;ChengXiang Zhai;Miles Efron
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801;School of Library and Information Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801
Venue:
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Year:
2013

Citing 4
Cited 0

Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Estimation of statistical translation models based on mutual information for ad hoc information retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Contextual bearing on linguistic variation in social media

LSM '11 Proceedings of the Workshop on Languages in Social Media
Axiomatic analysis of translation language model for information retrieval

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the prevalence of social media applications, an increasing number of internet users are actively publishing text information on-line. This influx provides a wealth of text information on those users. Ranking in social media poses different challenges than Web search ranking, one of which is that Microblog messages are really short. As a result, the vocabulary mismatch problem is exacerbated in social media search. In this paper, we first study the standard translation model for this problem and reveal that translation language model not only helps to bridge the vocabulary gap but also improves the estimate of Term Frequency. We further propose two ways to improve translation language model through leveraging Hashtag information and adaptively setting the self-translation parameter. Experimental results on Twitter data set show that our proposed methods are effective.