Successfully detecting and correcting false friends using channel profiles

Authors:
Ulrich Reffle;Annette Gotscharek;Christoph Ringlstetter;Klaus U. Schulz
Affiliations:
University of Munich (LMU);University of Munich (LMU);University of Alberta;University of Munich (LMU)
Venue:
Proceedings of the second workshop on Analytics for noisy unstructured text data
Year:
2008

Citing 13
Cited 1

Spelling checkers,spelling correctors and the misspellings of poor spellers

Information Processing and Management: an International Journal
Context based spelling correction

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
The effects of noisy data on text retrieval

Journal of the American Society for Information Science
Effects of OCR errors on ranking and feedback using the vector space model

Information Processing and Management: an International Journal
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text

Information Retrieval
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Performance evaluation for text processing of noisy inputs

Proceedings of the 2005 ACM symposium on Applied computing
Deriving Symbol Dependent Edit Weights for Text Correction_The Use of Error Dictionaries

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Using automated error profiling of texts for improved selection of correction candidates for garbled tokens

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing

Studying the effects of noisy text on text mining applications

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

The detection and correction of false friends - also called real-word errors - is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile helps to restrict attention to a small set of "suspicious" lexical tokens of the input text where it is "plausible" to assume that the token represents a false friend. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends.