Learning to resolve natural language ambiguities: a unified approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
More accurate tests for the statistical significance of result differences
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A bootstrap evaluation of the effect of data splitting on financial time series
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Statistical NLP systems are frequently evaluated and compared on the basis of their performances on a single split of training and test data. Results obtained using a single split are, however, subject to sampling noise. In this paper we argue in favour of reporting a distribution of performance figures, obtained by resampling the training data, rather than a single number. The additional information from distributions can be used to make statistically quantified statements about differences across parameter settings, systems, and corpora.