Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Information Retrieval
Error mining for wide-coverage grammar engineering
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Error mining in parsing results
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Chart mining-based lexical acquisition with precision grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Grammar-driven versus data-driven: which parsing system is more affected by domain shifts?
NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Using unknown word techniques to learn known words
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Benchmarking for syntax-based sentential inference
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Reducing overdetections in a French symbolic grammar checker by classification
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Error mining on dependency trees
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Generation for grammar engineering
INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
An automatic approach to treebank error detection using a dependency parser
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.01 |
Error mining is a useful technique for identifying forms that cause incomplete parses of sentences. We extend the iterative method of Sagot and de la Clergerie (2006) to treat n-grams of an arbitrary length. An inherent problem of incorporating longer n-grams is data sparseness. Our new method takes sparseness into account, producing n-grams that are as long as necessary to identify problematic forms, but not longer. Not every cause for parsing errors can be captured effectively by looking at word n-grams. We report on an algorithm for building more general patterns for mining, consisting of words and part of speech tags. It is not easy to evaluate the various error mining techniques. We propose a new evaluation metric which will enable us to compare different error miners.