Exploiting aligned parallel corpora in multilingual studies and applications

Authors:
Dan Tufis
Affiliations:
Research Institute for Artificial Intelligence, Romanian Academy, Bucharest, Romania
Venue:
IWIC'07 Proceedings of the 1st international conference on Intercultural collaboration
Year:
2007

Citing 12
Cited 0

EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Towards a standard upper ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Fast and Accurate Sentence Alignment of Bilingual Corpora

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Automatically extracting and representing collocations for language generation

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word alignment for languages with scarce resources

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Combined word alignments

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and used as evidence on linguistic facts which, in a monolingual context, might be unavailable to (or overlooked by) a computer program. Multilingual technologies, which to a large extent are language independent, provide a powerful support for systematic and consistent cross-lingual studies and allow for easier building of annotated linguistic resources for languages where such resources are scarce or missing. In this paper we will briefly present some underlying multilingual technologies and methodologies we developed for exploiting parallel corpora and we will discuss their relevance for crosslinguistic studies and applications.