Forgetting Exceptions is Harmful in Language Learning

  • Authors:
  • Walter Daelemans;Antal Van Den Bosch;Jakub Zavrel

  • Affiliations:
  • ILK / Computational Linguistics, Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg, The Netherlands. walter@kub.nl;ILK / Computational Linguistics, Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg, The Netherlands. antalb@kub.nl;ILK / Computational Linguistics, Tilburg University, P.O. Box 90153, NL-5000 LE Tilburg, The Netherlands. zavrel@kub.nl

  • Venue:
  • Machine Learning - Special issue on natural language learning
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show that in language learning, contrary to receivedwisdom, keeping exceptional training instances in memory can bebeneficial for generalization accuracy. We investigate this phenomenonempirically on a selection of benchmark natural language processingtasks: grapheme-to-phoneme conversion, part-of-speech tagging,prepositional-phrase attachment, and base noun phrase chunking. In afirst series of experiments we combine memory-based learning withtraining set editing techniques, in which instances are edited basedon their typicality and class prediction strength. Results show thatediting exceptional instances (with low typicality or low classprediction strength) tends to harm generalization accuracy. In asecond series of experiments we compare memory-based learning anddecision-tree learning methods on the same selection of tasks, andfind that decision-tree learning often performs worse thanmemory-based learning. Moreover, the decrease in performance can belinked to the degree of abstraction from exceptions (i.e., pruning oreagerness). We provide explanations for both results in terms of theproperties of the natural language processing tasks and the learningalgorithms.