Definition extraction using linguistic and structural features

Authors:
Eline Westerhout
Affiliations:
Utrecht University
Venue:
WDE '09 Proceedings of the 1st Workshop on Definition Extraction
Year:
2009

Citing 9
Cited 2

Random Forests

Machine Learning
The use of bigrams to enhance text categorization

Information Processing and Management: an International Journal
New Directions in Question Answering

New Directions in Question Answering
Learning to identify single-snippet answers to definition questions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Definition Extraction with Balanced Random Forests

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Dealing with Small, Noisy and Imbalanced Data

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Towards the automatic extraction of definitions in Slavic

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Extraction of definitions using grammar-enhanced machine learning

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Automatic extraction of definitions from German court decisions

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document

Learning word-class lattices for definition and hypernym extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Terminological paraphrase extraction from scientific literature based on predicate argument tuples

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper a combination of linguistic and structural information is used for the extraction of Dutch definitions. The corpus used is a collection of Dutch texts on computing and elearning containing 603 definitions. The extraction process consists of two steps. In the first step a parser using a grammar defined on the basis of the patterns observed in the definitions is applied on the complete corpus. Machine learning is thereafter applied to improve the results obtained with the grammar. The experiments show that using a combination of linguistic (n-grams, type of article, type of noun) and structural information (layout, position) is a promising approach to the definition extraction task.