Exploring term dependences in probabilistic information retrieval model

  • Authors:
  • Bong-Hyun Cho;Changki Lee;Gary Geunbae Lee

  • Affiliations:
  • R&D Center, Voiceware Co., Ltd., 4th Floor Doosan Credit Union Building, 651 Daechi-Dong, Gangnam-Gu, Seoul 135-280, South Korea;Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyoja dong, Nam Gu, Pohang 790-784, South Korea;Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31 Hyoja dong, Nam Gu, Pohang 790-784, South Korea

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each another. However, this kind of conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence in probabilistic retrieval model by adapting Bahadur-Lazarsfeld expansion (BLE) to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply BLE to the general probabilistic models and the state-of-the-art 2-Poisson model. Through the experiments on two standard document collections, HANTEC2.0 in Korean and WT10g in English, we demonstrate that incorporation of term dependences using the BLE significantly contribute to the improvement of performance in at least two different language IR systems.