Ambiguity resolution for machine translation of telegraphic messages

Authors:
Young-Suk Lee;Clifford Weinstein;Stephanie Seneff;Dinesh Tummala
Affiliations:
Lincoln Laboratory, MIT, Lexington, MA;Lincoln Laboratory, MIT, Lexington, MA;SLS, LCS, MIT, Cambridge, MA;Lincoln Laboratory, MIT, Lexington, MA
Venue:
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Year:
1997

Citing 5
Cited 3

Analyzing telegraphic messages

HLT '89 Proceedings of the workshop on Speech and Natural Language
TINA: a natural language system for spoken language applications

Computational Linguistics
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Automatic English-to-Korean text translation of telegraphic messages in a limited domain

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2

Interlingua-based English–Korean Two-way Speech Translation of Doctor–Patient Dialogues with CCLINC

Machine Translation
Interlingua-based broad-coverage Korean-to-English translation in CCLINC

HLT '01 Proceedings of the first international conference on Human language technology research
Using semantic authoring for Blissymbols communication boards

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Telegraphic messages with numerous instances of omission pose a new challenge to parsing in that a sentence with omission causes a higher degree of ambiguity than a sentence without omission. Misparsing induced by omissions has a far-reaching consequence in machine translation. Namely, a misparse of the input often leads to a translation into the target language which has incoherent meaning in the given context. This is more frequently the case if the structures of the source and target languages are quite different, as in English and Korean. Thus, the question of how we parse telegraphic messages accurately and efficiently becomes a critical issue in machine translation. In this paper we describe a technical solution for the issue, and present the performance evaluation of a machine translation system on telegraphic messages before and after adopting the proposed solution. The solution lies in a grammar design in which lexicalized grammar rules defined in terms of semantic categories and syntactic rules defined in terms of part-of-speech are utilized together. The proposed grammar achieves a higher parsing coverage without increasing the amount of ambiguity/misparsing when compared with a purely lexicalized semantic grammar, and achieves a lower degree of ambiguity/misparses without decreasing the parsing coverage when compared with a purely syntactic grammar.