Generating confusion sets for context-sensitive error correction

  • Authors:
  • Alla Rozovskaya;Dan Roth

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider the problem of generating candidate corrections for the task of correcting errors in text. We focus on the task of correcting errors in preposition usage made by non-native English speakers, using discriminative classifiers. The standard approach to the problem assumes that the set of candidate corrections for a preposition consists of all preposition choices participating in the task. We determine likely preposition confusions using an annotated corpus of non-native text and use this knowledge to produce smaller sets of candidates. We propose several methods of restricting candidate sets. These methods exclude candidate prepositions that are not observed as valid corrections in the annotated corpus and take into account the likelihood of each preposition confusion in the non-native text. We find that restricting candidates to those that are observed in the non-native data improves both the precision and the recall compared to the approach that views all prepositions as possible candidates. Furthermore, the approach that takes into account the likelihood of each preposition confusion is shown to be the most effective.