Towards the automatic extraction of definitions in Slavic

  • Authors:
  • Adam Przepiórkowski;Łukasz Degórski;Beata Wójtowicz;Miroslav Spousta;Vladislav Kuboň;Kiril Simov;Petya Osenova;Lothar Lemnitzer

  • Affiliations:
  • Institute of Computer Science, Warsaw, Poland;Institute of Computer Science, Warsaw, Poland;Institute of Computer Science, Warsaw, Poland;Charles University, Prague, Czech Republic;Charles University, Prague, Czech Republic;Institute for Parallel Processing BAS, Sofia, Bulgaria;Institute for Parallel Processing BAS, Sofia, Bulgaria;University of Tübingen, Tübingen, Germany

  • Venue:
  • ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the results of the preliminary experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Bulgarian, Czech and Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results are less than satisfying and we claim that the reason for that is the intrinsic difficulty of the task, as measured by the low interannotator agreement, which calls for more sophisticated deeper linguistic processing, as well as for the use of machine learning classification techniques.