Translation Disambiguation in Mixed Language Queries

  • Authors:
  • Percy Cheung;Pascale Fung

  • Affiliations:
  • Human Language Technology Center, Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Hong Kong;Human Language Technology Center, Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Hong Kong

  • Venue:
  • Machine Translation
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Code-switching is very common among bilingual speakers. Spoken queries by these speakers are typically in mixed language. In this paper, we propose an unsupervised method for mixed-language query understanding, using only a monolingual corpus and a bilingual dictionary. Secondary-language words mixed in a primary-language query are translated into words in the primary language. We found that using a single disambiguation feature for translation is more effective than using multiple features, provided this feature is based on the most salient seed-word, chosen automatically by con驴dence scoring. We propose and compare four types of disambiguation features that are based on context seed-words. A baseline method uses the nearest neighboring seed-word as disambiguation feature. Multiple-context seed-word voting is also proposed in order to enlarge the context window. On the other hand, merely using the inverse-distance as weights on context words degrades the performance as it runs counter to the potential underlying syntactic relations between words. Our 驴nal proposal is a solution that uses multiple-context seed-words and the translation candidates of all mixed language words to select a single most salient seed-word for translation disambiguation. The translation disambiguation accuracy for this feature is at 83.7% for all words in the ATIS spontaneous speech query database, and 66.7% for content words.