TectoMT: modular NLP framework

Authors:
Martin Popel;Zdeněk Žabokrtský
Affiliations:
Charles University in Prague, Institute of Formal and Applied Linguistics;Charles University in Prague, Institute of Formal and Applied Linguistics
Venue:
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
Year:
2010

Citing 12
Cited 10

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
GATE: an architecture for development of robust HLT applications

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Robust, applied morphological generation

INLG '00 Proceedings of the first international conference on Natural language generation - Volume 14
Perl Best Practices

Perl Best Practices
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Recent advances in a feature-rich framework for treebank annotation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Phrase-based and deep syntactic English-to-Czech statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
English-Czech MT in 2008

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Czech named entity corpus and SVM-based recognizer

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Multilinguality in ETAP-3: reuse of lexical resources

MLR '04 Proceedings of the Workshop on Multilingual Linguistic Ressources
Converting Russian treebank SynTagRus into Praguian PDT style

MRTECEEL '09 Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages

Effects of noun phrase bracketing in dependency parsing and machine translation

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Two-step translation with grammatical post-processing

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Unsupervised dependency parsing using reducibility and fertility features

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Exploiting reducibility in unsupervised dependency parsing

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Prague markup language framework

LAW VI '12 Proceedings of the Sixth Linguistic Annotation Workshop
Towards a predicate-argument evaluation for MT

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Using parallel features in parsing of machine-translated sentences for correction of grammatical errors

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Probes in a taxonomy of factored phrase-based models

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Formemes in English-Czech deep syntactic MT

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
DEPFIX: a system for automatic correction of Czech MT outputs

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the present paper we describe TectoMT, a multi-purpose open-source NLP framework. It allows for fast and efficient development of NLP applications by exploiting a wide range of software modules already integrated in TectoMT, such as tools for sentence segmentation, tokenization, morphological analysis, POS tagging, shallow and deep syntax parsing, named entity recognition, anaphora resolution, tree-to-tree translation, natural language generation, word-level alignment of parallel corpora, and other tasks. One of the most complex applications of TectoMT is the English-Czech machine translation system with transfer on deep syntactic (tectogrammatical) layer. Several modules are available also for other languages (German, Russian, Arabic).Where possible, modules are implemented in a language-independent way, so they can be reused in many applications.