Retrieving descriptive phrases from large amounts of free text
Proceedings of the ninth international conference on Information and knowledge management
Briefly noted: defining language: A local grammar of definition sentences
Computational Linguistics - Special issue on web as corpus
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to identify single-snippet answers to definition questions
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Supporting e-Learning with Language Technology for Portuguese
PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Definition Extraction with Balanced Random Forests
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Dealing with Small, Noisy and Imbalanced Data
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.00 |
In this paper we report on the performance of different learning algorithms and different sampling technique applied to a definition extraction task, using data sets in different language. We compare our results with those obtained by handcrafted rules to extract definitions. When Definition Extraction is handled with machine learning algorithms, two different issues arise. On the one hand, in most cases the data set used to extract definitions is unbalanced, and this means that it is necessary to deal with this characteristic with specific techniques. On the other hand it is possible to use the same methods to extract definitions from documents in different corpus, making the classifier language independent.