Translation Disambiguation in Mixed Language Queries

Authors:
Percy Cheung;Pascale Fung
Affiliations:
Human Language Technology Center, Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Hong Kong;Human Language Technology Center, Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Hong Kong
Venue:
Machine Translation
Year:
2004

Citing 22
Cited 2

A statistical approach to machine translation

Computational Linguistics
Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Word sense disambiguation using a second language monolingual corpus

Computational Linguistics
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
How may I help you?

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Resolving ambiguity for cross-language retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Estimating upper and lower bounds on the performance of word-sense disambiguation programs

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
One sense per collocation

HLT '93 Proceedings of the workshop on Human Language Technology

Towards an optimal weighting of context words based on distance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Query translation-based cross-language print defect diagnosis based on the fuzzy Bayesian model

Journal of Intelligent Manufacturing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Code-switching is very common among bilingual speakers. Spoken queries by these speakers are typically in mixed language. In this paper, we propose an unsupervised method for mixed-language query understanding, using only a monolingual corpus and a bilingual dictionary. Secondary-language words mixed in a primary-language query are translated into words in the primary language. We found that using a single disambiguation feature for translation is more effective than using multiple features, provided this feature is based on the most salient seed-word, chosen automatically by con驴dence scoring. We propose and compare four types of disambiguation features that are based on context seed-words. A baseline method uses the nearest neighboring seed-word as disambiguation feature. Multiple-context seed-word voting is also proposed in order to enlarge the context window. On the other hand, merely using the inverse-distance as weights on context words degrades the performance as it runs counter to the potential underlying syntactic relations between words. Our 驴nal proposal is a solution that uses multiple-context seed-words and the translation candidates of all mixed language words to select a single most salient seed-word for translation disambiguation. The translation disambiguation accuracy for this feature is at 83.7% for all words in the ATIS spontaneous speech query database, and 66.7% for content words.