Successfully detecting and correcting false friends using channel profiles

  • Authors:
  • Ulrich Reffle;Annette Gotscharek;Christoph Ringlstetter;Klaus U. Schulz

  • Affiliations:
  • University of Munich (LMU);University of Munich (LMU);University of Alberta;University of Munich (LMU)

  • Venue:
  • Proceedings of the second workshop on Analytics for noisy unstructured text data
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The detection and correction of false friends - also called real-word errors - is a notoriously difficult problem. On realistic data, the break-even point for automatic correction so far could not be reached: the number of additional infelicitous corrections outnumbered the useful corrections. We present a new approach where we first compute a profile of the error channel for the given text. During the correction process, the profile helps to restrict attention to a small set of "suspicious" lexical tokens of the input text where it is "plausible" to assume that the token represents a false friend. Using a conventional word trigram statistics for disambiguation we obtain a correction method that can be successfully applied to unrestricted text. In experiments for OCR documents, we show significant accuracy gains by fully automatic correction of false friends.