Optimizing language model information retrieval system with expectation maximization algorithm

  • Authors:
  • Justin Liang-Te Chiu;Jyun-Wei Huang

  • Affiliations:
  • National Taiwan University, Taipei, Taiwan, ROC;Yuan Ze University, Chungli, Taoyuan, Taiwan, ROC

  • Venue:
  • ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical language modeling (SLM) has been used in many different domains for decades and has also been applied to information retrieval (IR) recently. Documents retrieved using this approach are ranked according their probability of generating the given query. In this paper, we present a novel approach that employs the generalized Expectation Maximization (EM) algorithm to improve language models by representing their parameters as observation probabilities of Hidden Markov Models (HMM). In the experiments, we demonstrate that our method outperforms standard SLM-based and tf.idf-based methods on TREC 2005 HARD Track data.