Exploiting syntactic and distributional information for spelling correction with web-scale n-gram models

  • Authors:
  • Wei Xu;Joel Tetreault;Martin Chodorow;Ralph Grishman;Le Zhao

  • Affiliations:
  • New York University, NY;Educational Testing Service, Princeton, NJ;Hunter College of CUNY, New York, NY;New York University, NY;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.