Latent topic modelling of word co-occurence information for spoken document retrieval

  • Authors:
  • Berlin Chen

  • Affiliations:
  • Department of Computer Science&Information Engineering, National Taiwan Normal University, Taipei, Taiwan

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a word topic model (WTM) approach, discovering the co-occurrence relationship between words as well as the long-span latent topic information, for spoken document retrieval (SDR). A given document as a whole is modeled as a composite WTM model for generating an observed query. The underlying characteristics and different kinds of model structures are extensively investigated, while the performance of WTM is thoroughly analyzed and verified by comparison with a few existing retrieval models on the TDT-2 SDR task. We also attempt to incorporate part-of-speech (POS) weighting into the representations of the query observations and the WTM models for obtaining better retrieval performance.