Spelling correction using probabilistic methods

  • Authors:
  • R. L. Kashyap;B. J. Oommen

  • Affiliations:
  • School of Electrical Engineering, Purdue University, West Lafayette, IN 47907, USA;The Carleton University, Dept. of Computer Science, Ottawa, KIS 5R6, Ontario, Canada

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 1984

Quantified Score

Hi-index 0.10

Visualization

Abstract

A probabilistic procedure is suggested for the automatic correction of spelling and typing errors in printed English texts. The heart of the procedure is a probabilistic model for the generation of the garbled word from the correct word. The garbler can delete or insert symbols in the word or substitute one or more symbols by other symbols. An expression is derived for P(Y @? X), the probability of generating a garbled word Y from a correct word X. The model is probabilistically consistent. Using the expression for P(Y @? X), we can derive an estimate of the correct word from the garbled word Y so as to minimize the average probability of error in the decision. One of the important features of the expression P(Y @? X) is that it can be computed recursively. Experiments conducted using the dictionary of 1025 most common English words indicate that the accuracy of correction by this scheme is substantially greater than that which can be obtained by other algorithms especially while dealing with garbled words derived from relatively short words of length less than 6.