Multilingual lexical database generation from parallel texts in 20 European languages with endogenous resources

Authors:
Giguet Emmanuel;LUQUET Pierre-Sylvain
Affiliations:
Université de Caen, Caen Cedex - France;Université de Caen, Caen Cedex - France
Venue:
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Year:
2006

Citing 9
Cited 3

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatic alignment in parallel corpora

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Morphemes as necessary concept for structures discovery from untagged corpora

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

Terminological and ontological analysis of European directives: multilinguism in law

Proceedings of the 11th international conference on Artificial intelligence and law
Multilingual ontological analysis of European directives

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Multilevel legal ontologies

Semantic Processing of Legal Texts

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the 'Acquis Communautaire' Corpus.