A fast generative spell corrector based on edit distance

Authors:
Ishan Chattopadhyaya;Kannappan Sirchabesan;Krishanu Seal
Affiliations:
MapQuest, AOL, India;MapQuest, AOL, India;MapQuest, AOL, India
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 7
Cited 0

The double metaphone search algorithm

C/C++ Users Journal
Some approaches to best-match file searching

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
A technique for computer detection and correction of spelling errors

Communications of the ACM
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Hashing-based approaches to spelling correction of personal names

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Simple and efficient algorithm for approximate dictionary matching

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the main challenges in the implementation of web-scale online search systems is the disambiguation of the user input when portions of the input queries are possibly misspelt. Spell correctors that must be integrated with such systems have very stringent restrictions imposed on them; primarily they must possess the ability to handle large volume of concurrent queries and generate relevant spelling suggestions at a very high speed. Often, these systems consist of highend server machines with lots of memory and processing power and the requirement from such spell correctors is to minimize the latency of generating suggestions to a bare minimum. In this paper, we present a spell corrector that we developed to cater to high volume incoming queries for an online search service. It consists of a fast, per-token candidate generator which generates spell suggestions within a distance of two edit operations of an input token. We compare its performance against an n-gram based spell corrector and show that the presented spell candidate generation approach has lower response times.