Dealing with Small, Noisy and Imbalanced Data

  • Authors:
  • Adam Przepiórkowski;Michał Marcińczuk;Łukasz Degórski

  • Affiliations:
  • Institute of Computer Science, Polish Academy of Sciences, Warsaw, and Institute of Informatics, Warsaw University,;Institute of Applied Informatics, Wrocław University of Technology,;Institute of Computer Science, Polish Academy of Sciences, Warsaw,

  • Venue:
  • TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with the task of definition extraction with the training corpus suffering from the problems of small size, high noise and heavy imbalance. A previous approach, based on manually constructed shallow grammars, turns out to be hard to better even by such robust classifiers as SVMs, AdaBoost and simple ensembles of classifiers. However, a linear combination of various such classifiers and manual grammars significantly improves the results of the latter.