Mandarin-English Information (MEI): investigating translingual speech retrieval

Authors:
Helen Meng;Berlin Chen;Sanjeev Khudanpur;Gina-Anne Levow;Wai-Kit Lo;Douglas Oard;Patrick Schone;Karen Tang;Hsin-Min Wang;Jianqiang Wang
Affiliations:
The Chinese University of Hong Kong;Academia Sinica;Johns Hopkins University;University of Maryland at College Park;The Chinese University of Hong Kong;University of Maryland at College Park;-;Princeton University;Academia Sinica;University of Maryland at College Park
Venue:
HLT '01 Proceedings of the first international conference on Human language technology research
Year:
2001

Citing 6
Cited 9

A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese

Speech Communication - Special issue on accessing information in spoken audio
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Cross-Language Access to Recorded Speech in the MALACH Project

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Comparison of Word and Subword Indexing Techniques for Mandarin Chinese Spoken Document Retrieval

PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Introduction to topic detection and tracking

Topic detection and tracking
Applying query structuring in cross-language retrieval

Information Processing and Management: an International Journal
Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion

ACM Transactions on Asian Language Information Processing (TALIP)
Dictionary-based techniques for cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks -- multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.