Distinguishing easy and hard instances

Authors:
Yuval Krymolowski
Affiliations:
Bar-Ilan University, Ramat Gan, Israel
Venue:
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Year:
2002

Citing 11
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Selecting typical instances in instance-based learning

ML92 Proceedings of the ninth international workshop on Machine learning
Bagging predictors

Machine Learning
Prototype selection for composite nearest neighbor classifiers

Prototype selection for composite nearest neighbor classifiers
Learning to resolve natural language ambiguities: a unified approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Assessing system agreement and instance difficulty in the lexical sample tasks of SENSEVAL-2

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8

Quantified Score

Hi-index	0.00

Visualization

Abstract

Error analysis is a key step in developing statistical parsers. In doing this, we manually discover typical cases by examining parser output. In this paper we argue that the process can be speeded up by considering the output from an ensemble of parsers. We do this by resampling small proportions (10% and up) from the training data, and exploiting the high diversity of the resulting parsers - resulting from the sparseness of natural-language data. Varying the sample size, we can trace the gradual learning of each instance and classify instances into a few types. This division helps in distinguishing instances which are hard for the system, from instances which may be learned in principle. We suggest that such analysis can yield a qualitative approach to evaluation of statistical parsers.