Using Word Formation Rules to Extend MT Lexicons

Authors:
Claudia Gdaniec;Esmé Manandise
Affiliations:
-;-
Venue:
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Year:
2002

Citing 6
Cited 0

Associative model of morphological analysis: an empirical inquiry

Computational Linguistics
Treatment of Unknown Words

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars

Proceedings of the International Symposium on Natural Language and Logic
The LMT Transformational System

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Aggressive morphology for robust lexical coverage

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Computer methods for morphological analysis

ACL '86 Proceedings of the 24th annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the IBM LMT Machine Translation (MT) system, a built-in strategy provides lexical coverage of a particular subset of words that are not listed in its bilingual lexicons. The recognition and coding of these words and their transfer generation is based on a set of derivational morphological rules. A new utility extends unfound words of this type in an LMT-compatible format in an auxiliary bilingual lexical file to be subsequently merged into the core lexicons. What characterizes this approach is the use of morphological, semantic, and syntactic features for both analysis and transfer. The auxiliary lexical file (ALF) has to be revised before a merge into the core lexicons. This utility integrates a linguistics-based analysis and transfer rules with a corpus-based method of verifying or falsifying linguistic hypotheses against extensive document translation, which in addition yields statistics on frequencies of occurrence as well as local context.