Definition extraction using linguistic and structural features

  • Authors:
  • Eline Westerhout

  • Affiliations:
  • Utrecht University

  • Venue:
  • WDE '09 Proceedings of the 1st Workshop on Definition Extraction
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper a combination of linguistic and structural information is used for the extraction of Dutch definitions. The corpus used is a collection of Dutch texts on computing and elearning containing 603 definitions. The extraction process consists of two steps. In the first step a parser using a grammar defined on the basis of the patterns observed in the definitions is applied on the complete corpus. Machine learning is thereafter applied to improve the results obtained with the grammar. The experiments show that using a combination of linguistic (n-grams, type of article, type of noun) and structural information (layout, position) is a promising approach to the definition extraction task.