Topic modeling for spoken document retrieval using word- and syllable-level information

  • Authors:
  • Shih-Hsiang Lin;Berlin Chen

  • Affiliations:
  • National Taiwan Normal University, Taipei, Taiwan Roc;National Taiwan Normal University, Taipei, Taiwan Roc

  • Venue:
  • SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.