Extraction of definitions using grammar-enhanced machine learning

Authors:
Eline Westerhout
Affiliations:
Utrecht University, Utrecht, The Netherlands
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Year:
2009

Citing 6
Cited 4

Machine Learning

Machine Learning
Random Forests

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Learning to identify single-snippet answers to definition questions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Towards the automatic extraction of definitions in Slavic

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Automatic extraction of definitions from German court decisions

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document

Definition extraction using linguistic and structural features

WDE '09 Proceedings of the 1st Workshop on Definition Extraction
An automatic definition extraction in Arabic language

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Can click patterns across user's query logs predict answers to definition questions?

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Automatic extraction of prerequisites and learning outcome from learning material

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we compare different approaches to extract definitions of four types using a combination of a rule-based grammar and machine learning. We collected a Dutch text corpus containing 549 definitions and applied a grammar on it. Machine learning was then applied to improve the results obtained with the grammar. Two machine learning experiments were carried out. In the first experiment, a standard classifier and a classifier designed specifically to deal with im-balanced datasets are compared. The algorithm designed specifically to deal with imbalanced datasets for most types outperforms the standard classifier. In the second experiment we show that classification results improve when information on definition structure is included.