Language independent system for definition extraction: first results using learning algorithms

Authors:
Rosa Del Gaudio;António Branco
Affiliations:
University of Lisbon, Lisbon, Portugal;University of Lisbon, Lisbon, Portugal
Venue:
WDE '09 Proceedings of the 1st Workshop on Definition Extraction
Year:
2009

Citing 11
Cited 1

Retrieving descriptive phrases from large amounts of free text

Proceedings of the ninth international conference on Information and knowledge management
Briefly noted: defining language: A local grammar of definition sentences

Computational Linguistics - Special issue on web as corpus
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to identify single-snippet answers to definition questions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Supporting e-Learning with Language Technology for Portuguese

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Definition Extraction with Balanced Random Forests

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Dealing with Small, Noisy and Imbalanced Data

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Extracting glossary sentences from scholarly articles: a comparative evaluation of pattern bootstrapping and deep analysis

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we report on the performance of different learning algorithms and different sampling technique applied to a definition extraction task, using data sets in different language. We compare our results with those obtained by handcrafted rules to extract definitions. When Definition Extraction is handled with machine learning algorithms, two different issues arise. On the one hand, in most cases the data set used to extract definitions is unbalanced, and this means that it is necessary to deal with this characteristic with specific techniques. On the other hand it is possible to use the same methods to extract definitions from documents in different corpus, making the classifier language independent.