A large scale ranker-based system for search query spelling correction

Authors:
Jianfeng Gao;Xiaolong Li;Daniel Micol;Chris Quirk;Xu Sun
Affiliations:
Microsoft Research, Redmond;Microsoft Corporation;Microsoft Corporation;Microsoft Research, Redmond;University of Tokyo
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 19
Cited 16

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Automatic Rule Acquisition for Spelling Correction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A spelling correction program based on a noisy channel model

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Pronunciation modeling for improved spelling correction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Improving web search ranking by incorporating user behavior information

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring distributional similarity based models for query spelling correction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Smoothing clickthrough data for web search ranking

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Distributed language modeling for N-best list re-ranking

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Using the web for language independent spellchecking and autocorrection

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning phrase-based spelling error models from clickthrough data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Online spelling correction for query completion

Proceedings of the 20th international conference on World wide web
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Review of MSR-Bing web scale speller challenge

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
CloudSpeller: query spelling correction by using a unified hidden markov model with web-scale resources

Proceedings of the 21st international conference companion on World Wide Web
CHIME: an efficient error-tolerant Cinese pinyin input method

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Adaptive query suggestion for difficult queries

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A generalized hidden Markov model with discriminative training for query spelling correction

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
How are spelling errors generated and corrected?: a study of corrected and uncorrected spelling errors using keystroke logs

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A unified approach to transliteration-based text input with online spelling correction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Learning lexicon models from search logs for query expansion

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A discriminative model for query spelling correction with latent structural SVM

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast multi-task learning for query spelling correction

Proceedings of the 21st ACM international conference on Information and knowledge management
Interactive and context-aware tag spell check and correction

Proceedings of the 21st ACM international conference on Information and knowledge management
Personalizing search using socially enhanced interest model, built from the stream of user's activity

Journal of Web Engineering
Query expansion using path-constrained random walks

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Speller performance prediction for query autocorrection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper makes three significant extensions to a noisy channel speller designed for standard written text to target the challenging domain of search queries. First, the noisy channel model is subsumed by a more general ranker, which allows a variety of features to be easily incorporated. Second, a distributed infrastructure is proposed for training and applying Web scale n-gram language models. Third, a new phrase-based error model is presented. This model places a probability distribution over transformations between multi-word phrases, and is estimated using large amounts of query-correction pairs derived from search logs. Experiments show that each of these extensions leads to significant improvements over the state-of-the-art baseline methods.