Topic modeling for spoken document retrieval using word- and syllable-level information

Authors:
Shih-Hsiang Lin;Berlin Chen
Affiliations:
National Taiwan Normal University, Taipei, Taiwan Roc;National Taiwan Normal University, Taipei, Taiwan Roc
Venue:
SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
Year:
2009

Citing 11
Cited 0

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents

ACM Transactions on Asian Language Information Processing (TALIP)
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A lattice-based approach to query-by-example spoken document retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Statistical Language Models for Information Retrieval

Statistical Language Models for Information Retrieval
Word Topic Models for Spoken Document Retrieval and Transcription

ACM Transactions on Asian Language Information Processing (TALIP)
Latent topic modelling of word co-occurence information for spoken document retrieval

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Expectation-propagation for the generative aspect model

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.