wEBMT: developing and validating an example-based machine translation system using the world wide web

Authors:
Andy Way;Nano Gough
Affiliations:
School of Computing, Dublin City University, Dublin 9, Ireland;School of Computing, Dublin City University, Dublin 9, Ireland
Venue:
Computational Linguistics - Special issue on web as corpus
Year:
2003

Citing 14
Cited 9

A framework of a mechanical translation between Japanese and English by analogy principle

Proc. of the international NATO symposium on Artificial and human intelligence
The self-extending phrasal lexicon

Computational Linguistics - Special issue of the lexicon
Generating language with a phrasal lexicon

Natural language generation systems
Example-Based Machine Translation via the Web

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Toward a Hybrid Integrated Translation Environment

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Hybrid Language Processing in the Spoken Language Translator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Data-Oriented Parsing

Data-Oriented Parsing
THE WEIGHTED MAJORITY ALGORITHM (Supersedes 89-16)

THE WEIGHTED MAJORITY ALGORITHM (Supersedes 89-16)
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Three heads are better than one

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
The phrasal lexicon

TINLAP '75 Proceedings of the 1975 workshop on Theoretical issues in natural language processing
Automated generalization of translation examples

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Toward memory-based translation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Learning translation templates from bilingual text

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Web-based models for natural language processing

ACM Transactions on Speech and Language Processing (TSLP)
Comparing example-based and statistical machine translation

Natural Language Engineering
Controlled Translation in an Example-based Environment: What do Automatic Evaluation Metrics Tell Us?

Machine Translation
Orthographic Errors in Web Pages: Toward Cleaner Web Corpora

Computational Linguistics
An Intelligent Web Agent to Mine Bilingual Parallel Pages via Automatic Discovery of URL Pairing Patterns

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Improved word alignment with statistics and linguistic heuristics

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A kernel regression framework for SMT

Machine Translation
Panning for EBMT gold, or "Remembering not to forget"

Machine Translation
Arabic text to arabic sign language translation system for the deaf and hearing-impaired community

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system's memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the Penn-II Treebank. In subsequent stages, the 〈source, target〉 translation pairs obtained are automatically transformed into a series of resources that render the translation process more successful. Despite the fact that the output from on-line MT systems is often faulty, we demonstrate in a number of experiments that when used to seed the memories of an EBMT system, they can in fact prove useful in generating translations of high quality in a robust fashion. In addition, we demonstrate the relative gain of EBMT in comparison to on-line systems. Second, despite the perception that the documents available on the Web are of questionable quality, we demonstrate in contrast that such resources are extremely useful in automatically postediting translation candidates proposed by our system.