Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
Text Encoding Initiative: Background and Contexts
Text Encoding Initiative: Background and Contexts
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Automating the acquisition of bilingual terminology
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A DP based search using monotone alignments in statistical translation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Experiments and prospects of Example-Based Machine Translation
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Towards automatic extraction of monolingual and bilingual terminology
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Evaluation of an algorithm for the recognition and classification of proper names
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Recycling Annotated Parallel Corpora for Bilingual Document Composition
AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
DTD-driven bilingual document generation
INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Spanish-basque parallel corpus structure: linguistic annotations and translation units
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Graph-based bilingual sentence alignment from large scale web pages
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Hi-index | 0.00 |
Rich mark-up can considerably benefit the process of establishing bitext correspondences, that is, the task of providing correct identification and alignment methods for text segments that are translation equivalences of each other in a parallel corpus. We present a sentence alignment algorithm that, by taking advantage of previously annotated texts, obtains accuracy rates close to 100%. The algorithm evaluates the similarity of the linguistic and extralinguistic mark-up in both sides of a bitext. Given that annotations are neutral with respect to typological, grammatical and orthographical differences between languages, rich mark-up becomes an optimal foundation to support bitext correspondences. The main originality of this approach is that it makes maximal use of annotations, which is a very sensible and efficient method for the exploitation of parallel corpora when annotations exist.