Measuring contextual fitness using error contexts extracted from the Wikipedia revision history

Authors:
Torsten Zesch
Affiliations:
Technische Universität Darmstadt
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 23
Cited 0

Context based spelling correction

Information Processing and Management: an International Journal
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Contextual correlates of synonymy

Communications of the ACM
Scaling Up Context-Sensitive Text Correction

Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelligence Conference
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Contextual spelling correction using latent semantic analysis

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Combining Trigram-based and feature-based methods for context-sensitive spelling correction

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A spelling correction program based on a noisy channel model

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Efficient parsing of highly ambiguous context-free grammars with bit vectors

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Semantic similarity for detecting recognition errors in automatic speech transcripts

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Context-Sensitive Error Correction: Using Topic Models to Improve OCR

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Using wiktionary for computing semantic relatedness

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Real-word spelling correction using Google Web IT 3-grams

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Natural Language Engineering
Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Annotating ESL errors: challenges and rewards

IUNLPBEA '10 Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Evaluating models of latent document semantics in the presence of OCR errors

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Wikipedia revision toolkit: efficiently accessing Wikipedia's edit history

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated. Additionally, we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally, we show that statistical and knowledge-based methods can be combined for increased performance.