Real-word spelling correction using Google Web IT 3-grams

Authors:
Aminul Islam;Diana Inkpen
Affiliations:
University of Ottawa, Ottawa, ON, Canada;University of Ottawa, Ottawa, ON, Canada
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Year:
2009

Citing 10
Cited 8

A bit-string longest-common-subsequence algorithm

Information Processing Letters
Context based spelling correction

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Scaling Up Context-Sensitive Text Correction

Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelligence Conference
Bitext maps and alignment via pattern recognition

Computational Linguistics
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Real-word spelling correction with trigrams: a reconsideration of the Mays, Damerau, and Mercer model

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
N-gram similarity and distance

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

Practical linguistic steganography using contextual synonym substitution and vertex colour coding

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A fast and accurate method for approximate string search

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Measuring contextual fitness using error contexts extracted from the Wikipedia revision history

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
An unsupervised and data-driven approach for spell checking in Vietnamese OCR-scanned texts

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Detection of semantic errors in Arabic texts

Artificial Intelligence
Memory-efficient groupby-aggregate using compressed buffer trees

Proceedings of the 4th annual Symposium on Cloud Computing
Unsupervised word sense disambiguation with N-gram features

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for detecting and correcting multiple real-word spelling errors using the Google Web IT 3-gram data set and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Our method is focused mainly on how to improve the detection recall (the fraction of errors correctly detected) and the correction recall (the fraction of errors correctly amended), while keeping the respective precisions (the fraction of detections or amendments that are correct) as high as possible. Evaluation results on a standard data set show that our method outperforms two other methods on the same task.