Mandarin-English Information (MEI): investigating translingual speech retrieval

  • Authors:
  • Helen Meng;Berlin Chen;Sanjeev Khudanpur;Gina-Anne Levow;Wai-Kit Lo;Douglas Oard;Patrick Schone;Karen Tang;Hsin-Min Wang;Jianqiang Wang

  • Affiliations:
  • The Chinese University of Hong Kong;Academia Sinica;Johns Hopkins University;University of Maryland at College Park;The Chinese University of Hong Kong;University of Maryland at College Park;-;Princeton University;Academia Sinica;University of Maryland at College Park

  • Venue:
  • HLT '01 Proceedings of the first international conference on Human language technology research
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks -- multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.