Wikipedia-based semantic smoothing for the language modeling approach to information retrieval

  • Authors:
  • Xinhui Tu;Tingting He;Long Chen;Jing Luo;Maoyuan Zhang

  • Affiliations:
  • Engineering S Research Center For Information Technology On Education, Huazhong Normal University, Wuhan, China;Engineering S Research Center For Information Technology On Education, Huazhong Normal University, Wuhan, China;Birkbeck, University of London;Department of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China;Engineering S Research Center For Information Technology On Education, Huazhong Normal University, Wuhan, China

  • Venue:
  • ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. These models are not very efficient when faced with ambiguous words and phrases because they are unable to incorporate contextual information. To overcome this limitation, we propose a novel Wikipedia-based semantic smoothing method that decomposes a document into a set of weighted Wikipedia concepts and then maps those unambiguous Wikipedia concepts into query terms. The mapping probabilities from each Wikipedia concept to individual terms are estimated through the EM algorithm. Document models based on Wikipedia concept mapping are then derived. The new smoothing method is evaluated on the TREC Ad Hoc Track (Disks 1, 2, and 3) collections. Experiments show significant improvements over the two-stage language model, as well as the language model with translation-based semantic smoothing.