A generalized method for iterative error mining in parsing results

  • Authors:
  • Daniël de Kok;Jianqiang Ma;Gertjan van Noord

  • Affiliations:
  • University of Groningen;University of Groningen;University of Groningen

  • Venue:
  • GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Error mining is a useful technique for identifying forms that cause incomplete parses of sentences. We extend the iterative method of Sagot and de la Clergerie (2006) to treat n-grams of an arbitrary length. An inherent problem of incorporating longer n-grams is data sparseness. Our new method takes sparseness into account, producing n-grams that are as long as necessary to identify problematic forms, but not longer. Not every cause for parsing errors can be captured effectively by looking at word n-grams. We report on an algorithm for building more general patterns for mining, consisting of words and part of speech tags. It is not easy to evaluate the various error mining techniques. We propose a new evaluation metric which will enable us to compare different error miners.