Machine Learning
The use of bigrams to enhance text categorization
Information Processing and Management: an International Journal
New Directions in Question Answering
New Directions in Question Answering
Learning to identify single-snippet answers to definition questions
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Definition Extraction with Balanced Random Forests
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Dealing with Small, Noisy and Imbalanced Data
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Towards the automatic extraction of definitions in Slavic
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Extraction of definitions using grammar-enhanced machine learning
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Automatic extraction of definitions from German court decisions
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Learning word-class lattices for definition and hypernym extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Terminological paraphrase extraction from scientific literature based on predicate argument tuples
Journal of Information Science
Hi-index | 0.00 |
In this paper a combination of linguistic and structural information is used for the extraction of Dutch definitions. The corpus used is a collection of Dutch texts on computing and elearning containing 603 definitions. The extraction process consists of two steps. In the first step a parser using a grammar defined on the basis of the patterns observed in the definitions is applied on the complete corpus. Machine learning is thereafter applied to improve the results obtained with the grammar. The experiments show that using a combination of linguistic (n-grams, type of article, type of noun) and structural information (layout, position) is a promising approach to the definition extraction task.