MULTEXT: Multilingual Text Tools and Corpora

Authors:
Nancy Ide;Jean Véronis
Affiliations:
Université de Provence, Aix-en-Provence, France;Université de Provence, Aix-en-Provence, France
Venue:
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Year:
1994

Citing 4
Cited 13

Guidelines for Electronic Text Encoding and Interchange: Volumes 1 and 2: P4

Guidelines for Electronic Text Encoding and Interchange: Volumes 1 and 2: P4
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics

Czech Translation of G. Orwell's `1984': Morphology and Syntactic Patterns in the Corpus

TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Learning semantic lexicons from a part-of-speech and semantically tagged corpus using inductive logic programming

The Journal of Machine Learning Research
Multext-East: parallel and comparable corpora and lexicons for six Central and Eastern European languages

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Comparing corpora and lexical ambiguity

WCC '00 Proceedings of the workshop on Comparing corpora - Volume 9
Sense discrimination with parallel corpora

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Comparing corpora and lexical ambiguity

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
The MULTEXT-east morphosyntactic specifications for Slavic languages

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Bulgarian-Polish-Lithuanian corpus: current development

MRTECEEL '09 Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages
Clues to compare languages for morphosyntactic analysis: a study run on parallel corpora and morphosyntactic lexicons

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
OWL/DL formalization of the MULTEXT-East morphosyntactic specifications

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Persian in MULTEXT-East framework

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
MULTEXT-East: morphosyntactic resources for Central and Eastern European languages

Language Resources and Evaluation
The XML framework and its implications for the development of natural language processing tools

Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

MULTEXT (Multilingual Text Tools and Corpora) is the largest project funded in the Commission of European Communities Linguistic Research and Engineering Program. The project will contribute to the development of generally usable software tools to manipulate and analyse text corpora and to create multilingual text corpora with structural and linguistic markup. It will attempt to establish conventions for the encoding of such corpora, building on and contributing to the preliminary recommendations of the relevant international and European standardization initiatives. MULTEXT will also work towards establishing a set of guidelines for text software development, which will be widely published in order to enable future development by others. All tools and data developed within the project will be made freely and publicly available.