OWL/DL formalization of the MULTEXT-East morphosyntactic specifications

Authors:
Christian Chiarcos;Tomaž Erjavec
Affiliations:
University of Potsdam, Germany;Jožef Stefan Institute, Slovenia
Venue:
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Year:
2011

Citing 6
Cited 1

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
MULTEXT: Multilingual Text Tools and Corpora

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
The MULTEXT-east morphosyntactic specifications for Slavic languages

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
ISOcat: remodelling metadata for language resources

International Journal of Metadata, Semantics and Ontologies
Persian in MULTEXT-East framework

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
The semantic gap of formalized meaning

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II

MULTEXT-East: morphosyntactic resources for Central and Eastern European languages

Language Resources and Evaluation

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes the modeling of the morphosyntactic annotations of the MULTEXT-East corpora and lexicons as an OWL/DL ontology. Formalizing annotation schemes in OWL/DL has the advantages of enabling formally specifying interrelationships between the various features and making logical inferences based on the relationships between them. We show that this approach provides us with a top-down perspective on a large set of morphosyntactic specifications for multiple languages, and that this perspective helps to identify and to resolve conceptual problems in the original specifications. Furthermore, the ontological modeling allows us to link the MULTEXT-East specifications with repositories of annotation terminology such as the General Ontology of Linguistics Descriptions or the ISO TC37/SC4 Data Category Registry.