Bitext correspondences through rich mark-up

Authors:
Raquel Martínez;Joseba Abaitua;Arantza Casillas
Affiliations:
Universidad Complutense de Madrid;Universidad de Deusto, Bilbao;Universidad de Alcalá de Henares
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Year:
1998

Citing 18
Cited 4

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Text Encoding Initiative: Background and Contexts

Text Encoding Initiative: Background and Contexts
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Automating the acquisition of bilingual terminology

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A DP based search using monotone alignments in statistical translation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Experiments and prospects of Example-Based Machine Translation

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Towards automatic extraction of monolingual and bilingual terminology

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Evaluation of an algorithm for the recognition and classification of proper names

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1

Recycling Annotated Parallel Corpora for Bilingual Document Composition

AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
DTD-driven bilingual document generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Spanish-basque parallel corpus structure: linguistic annotations and translation units

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Graph-based bilingual sentence alignment from large scale web pages

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rich mark-up can considerably benefit the process of establishing bitext correspondences, that is, the task of providing correct identification and alignment methods for text segments that are translation equivalences of each other in a parallel corpus. We present a sentence alignment algorithm that, by taking advantage of previously annotated texts, obtains accuracy rates close to 100%. The algorithm evaluates the similarity of the linguistic and extralinguistic mark-up in both sides of a bitext. Given that annotations are neutral with respect to typological, grammatical and orthographical differences between languages, rich mark-up becomes an optimal foundation to support bitext correspondences. The main originality of this approach is that it makes maximal use of annotations, which is a very sensible and efficient method for the exploitation of parallel corpora when annotations exist.