Definition Extraction with Balanced Random Forests

  • Authors:
  • Łukasz Kobyliński;Adam Przepiórkowski

  • Affiliations:
  • Institute of Computer Science, Warsaw University of Technology, Warszawa, Poland 00-665;Institute of Computer Science, Polish Academy of Sciences, Warszawa, Poland 01-237 and Institute of Informatics, University of Warsaw, Warszawa, Poland 02-097

  • Venue:
  • GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel machine learning approach to the task of identifying definitions in Polish documents. Specifics of the problem domain and characteristics of the available dataset have been taken into consideration, by carefully choosing and adapting a classification method to highly imbalanced and noisy data. We evaluate the performance of a Random Forest-based classifier in extracting definitional sentences from natural language text and give a comparison with previous work.