A Dirichlet-Smoothed Bigram Model for Retrieving Spontaneous Speech

  • Authors:
  • Matthew Lease;Eugene Charniak

  • Affiliations:
  • Brown Laboratory for Linguistic Information Processing (BLLIP), Brown University, Providence, USA;Brown Laboratory for Linguistic Information Processing (BLLIP), Brown University, Providence, USA

  • Venue:
  • Advances in Multilingual and Multimodal Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present two simple but effective smoothing techniqes for the standard language model (LM) approach to information retrieval [12]. First, we extend the unigram Dirichlet smoothing technique popular in IR [17] to bigram modeling [16]. Second, we propose a method of collection expansion for more robust estimation of the LM prior, particularly intended for sparse collections. Retrieval experiments on the MALACH archive [9] of automatically transcribed and manually summarized spontaneous speech interviews demonstrates strong overall system performance and the relative contribution of our extensions.