Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Automatic Rule Acquisition for Spelling Correction
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A spelling correction program based on a noisy channel model
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Immediate-head parsing for language models
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Pronunciation modeling for improved spelling correction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Smoothing clickthrough data for web search ranking
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Distributed language modeling for N-best list re-ranking
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Using the web for language independent spellchecking and autocorrection
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning phrase-based spelling error models from clickthrough data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Online spelling correction for query completion
Proceedings of the 20th international conference on World wide web
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Review of MSR-Bing web scale speller challenge
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Proceedings of the 21st international conference companion on World Wide Web
CHIME: an efficient error-tolerant Cinese pinyin input method
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Adaptive query suggestion for difficult queries
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A generalized hidden Markov model with discriminative training for query spelling correction
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A unified approach to transliteration-based text input with online spelling correction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Learning lexicon models from search logs for query expansion
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A discriminative model for query spelling correction with latent structural SVM
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast multi-task learning for query spelling correction
Proceedings of the 21st ACM international conference on Information and knowledge management
Interactive and context-aware tag spell check and correction
Proceedings of the 21st ACM international conference on Information and knowledge management
Journal of Web Engineering
Query expansion using path-constrained random walks
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Speller performance prediction for query autocorrection
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
This paper makes three significant extensions to a noisy channel speller designed for standard written text to target the challenging domain of search queries. First, the noisy channel model is subsumed by a more general ranker, which allows a variety of features to be easily incorporated. Second, a distributed infrastructure is proposed for training and applying Web scale n-gram language models. Third, a new phrase-based error model is presented. This model places a probability distribution over transformations between multi-word phrases, and is estimated using large amounts of query-correction pairs derived from search logs. Experiments show that each of these extensions leads to significant improvements over the state-of-the-art baseline methods.