Why nitpicking works: evidence for Occam's Razor in error correctors

Authors:
Dekai Wu;Grace Ngai;Marine Carpuat
Affiliations:
University of Science and Technology, Clear Water Bay, Hong Kong;Hong Kong Polytechnic University, Kowloon, Hong Kong;University of Science and Technology, Clear Water Bay, Hong Kong
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 11
Cited 2

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Learning Decision Lists

Machine Learning
Mixed-initiative development of language processing systems

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Learning with multiple stacking for named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Boosting for named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
NTPC: N-fold templated piped correction

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Chinese named entity recognition with cascaded hybrid model

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
NTPC: N-fold templated piped correction

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Empirical experience and observations have shown us when powerful and highly tunable classifiers such as maximum entropy classifiers, boosting and SVMs are applied to language processing tasks, it is possible to achieve high accuracies, but eventually their performances all tend to plateau out at around the same point. To further improve performance, various error correction mechanisms have been developed, but in practice, most of them cannot be relied on to predictably improve performance on unseen data; indeed, depending upon the test set, they are as likely to degrade accuracy as to improve it. This problem is especially severe if the base classifier has already been finely tuned.In recent work, we introduced N-fold Templated Piped Correction, or NTPC ("nitpick"), an intriguing error corrector that is designed to work in these extreme operating conditions. Despite its simplicity, it consistently and robustly improves the accuracy of existing highly accurate base models. This paper investigates some of the more surprising claims made by NTPC, and presents experiments supporting an Occam's Razor argument that more complex models are damaging or unnecessary in practice.