Probabilistic score propagation in information retrieval

  • Authors:
  • Chengxiang Zhai;Azadeh Shakery

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Probabilistic score propagation in information retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information retrieval techniques deal with different units of information such as terms, topics or documents. There usually exist explicit or implicit link structures between different items of each unit or between items across different units. For example hyperlinks between pages in a hypertext collection are explicit structures, while the links between terms in a co-occurrence network are implicit structures. Many of the traditional information retrieval methods only use the content information of the items for retrieval purposes and overlook the link structures. Those that use the link structures also do not fully exploit the discrimination power of contents as well as all useful link information. In this thesis, we propose a general probabilistic score propagation framework for combining content and link information, which can fully take advantage of content information and the link structures in a principled way. The basic idea of probabilistic score propagation is to first compute a content-based probability score for each item and then propagate the probabilities through different groups of neighbors. We exploit the content information as a basis to find the content probability score of an item and then use the link structure to define different groups of neighbors to propagate the probabilities through. We study three applications of this framework for improving retrieval accuracy in three different areas: "Hypertext Retrieval", "Smoothing of Document Language Models" and "Cross-Language Information Retrieval". The experiment results show that the score propagation framework provides a general effective way of exploiting link information along with the content information to improve the retrieval accuracy.