Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Authors:
Helena M. Caseli;Maria Das Nunes;Mikel L. Forcada
Affiliations:
NILC --- ICMC, University of São Paulo, São Carlos, Brazil;NILC --- ICMC, University of São Paulo, São Carlos, Brazil;Departament de Llenguatges i Sistemes Informàtics, Universitat d'Alacant, Alacant, Spain 03071
Venue:
Machine Translation
Year:
2006

Citing 17
Cited 6

Automatic Rule Learning for Resource-Limited MT

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Learning translation templates from bilingual text

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Inducing translation lexicons via diverse similarity measures and bridge languages

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Learning transfer rules for machine translation with limited data

Learning transfer rules for machine translation with limited data
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Using alignment templates to infer shallow-transfer machine translation rules

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Open-Source portuguese–spanish machine translation

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation

SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Supporting the adaptation of texts for poor literacy readers: a text simplification editor for Brazilian Portuguese

EdAppsNLP '09 Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Inferring shallow-transfer machine translation rules from small parallel corpora

Journal of Artificial Intelligence Research
Using common sense to generate culturally contextualized machine translation

YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Factored translation between Brazilian Portuguese and English

SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Implementing a language-independent MT methodology

MM '12 Proceedings of the First Workshop on Multilingual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than "shallow" monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese---Spanish and Brazilian Portuguese---English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).