Exploring the use of latent topical information for statistical Chinese spoken document retrieval

Authors:
Berlin Chen
Affiliations:
Graduate Institute of Computer Science and Information Engineering, National Taiwan Normal University, No. 88, Section 4, Ting-Chow Road, Taipei 116, Taiwan, ROC
Venue:
Pattern Recognition Letters
Year:
2006

Citing 24
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Information retrieval using a singular value decomposition model of latent semantic structure

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Document expansion for speech retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Phonetic confusion matrix based spoken document retrieval

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using heterogeneous thesauri

Information Processing and Management: an International Journal
Indexing and retrieval of broadcast news

Speech Communication - Special issue on accessing information in spoken audio
A system for the retrieval of Italian broadcast news

Speech Communication - Special issue on accessing information in spoken audio
Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese

Speech Communication - Special issue on accessing information in spoken audio
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The development of the HTK Broadcast News transcription system: an overview

Speech Communication - Special issue on automatic transcription of broadcast news data
The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
Large vocabulary continuous speech recognition of Broadcast News - The Philips/RWTH approach

Speech Communication - Special issue on automatic transcription of broadcast news data
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Pattern Recognition in Speech and Language Processing

Pattern Recognition in Speech and Language Processing
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

ACM Transactions on Asian Language Information Processing (TALIP)
A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents

ACM Transactions on Asian Language Information Processing (TALIP)

Extractive spoken document summarization for information retrieval

Pattern Recognition Letters
A Comparative Study of Probabilistic Ranking Models for Chinese Spoken Document Summarization

ACM Transactions on Asian Language Information Processing (TALIP)
Spoken Content Retrieval: A Survey of Techniques and Technologies

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.10

Visualization

Abstract

Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.