A generalized method for iterative error mining in parsing results

Authors:
Daniël de Kok;Jianqiang Ma;Gertjan van Noord
Affiliations:
University of Groningen;University of Groningen;University of Groningen
Venue:
GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
Year:
2009

Citing 4
Cited 8

Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Information Retrieval

Information Retrieval
Error mining for wide-coverage grammar engineering

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Error mining in parsing results

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Chart mining-based lexical acquisition with precision grammars

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Grammar-driven versus data-driven: which parsing system is more affected by domain shifts?

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Using unknown word techniques to learn known words

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Benchmarking for syntax-based sentential inference

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Reducing overdetections in a French symbolic grammar checker by classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Error mining on dependency trees

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Generation for grammar engineering

INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
An automatic approach to treebank error detection using a dependency parser

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

Error mining is a useful technique for identifying forms that cause incomplete parses of sentences. We extend the iterative method of Sagot and de la Clergerie (2006) to treat n-grams of an arbitrary length. An inherent problem of incorporating longer n-grams is data sparseness. Our new method takes sparseness into account, producing n-grams that are as long as necessary to identify problematic forms, but not longer. Not every cause for parsing errors can be captured effectively by looking at word n-grams. We report on an algorithm for building more general patterns for mining, consisting of words and part of speech tags. It is not easy to evaluate the various error mining techniques. We propose a new evaluation metric which will enable us to compare different error miners.