Extraction of definitions using grammar-enhanced machine learning

  • Authors:
  • Eline Westerhout

  • Affiliations:
  • Utrecht University, Utrecht, The Netherlands

  • Venue:
  • EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we compare different approaches to extract definitions of four types using a combination of a rule-based grammar and machine learning. We collected a Dutch text corpus containing 549 definitions and applied a grammar on it. Machine learning was then applied to improve the results obtained with the grammar. Two machine learning experiments were carried out. In the first experiment, a standard classifier and a classifier designed specifically to deal with im-balanced datasets are compared. The algorithm designed specifically to deal with imbalanced datasets for most types outperforms the standard classifier. In the second experiment we show that classification results improve when information on definition structure is included.