Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval

Authors:
Hsin-Min Wang;Berlin Chen
Affiliations:
-;-
Venue:
PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Year:
2001

Citing 2
Cited 1

Experiments in spoken document retrieval

Information Processing and Management: an International Journal - Special issue on history of information science
Mandarin-English Information (MEI): investigating translingual speech retrieval

HLT '01 Proceedings of the first international conference on Human language technology research

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/Ngram-based approach, are used in the present work. We focus on the use of an entire Chinese textual story (from a newspaper) as a query to retrieve Mandarin Chinese spoken documents (from news broadcasts). Experiments are based on the Topic Detection and Tracking Corpora.