Hidden Markov models for speech recognition
Technometrics
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Improving automatic query expansion
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Unsupervised query segmentation using generative language models and wikipedia
Proceedings of the 17th international conference on World Wide Web
A unified and discriminative model for query refinement
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Mining term association patterns from search logs for effective query reformulation
Proceedings of the 17th ACM conference on Information and knowledge management
Using the web for language independent spellchecking and autocorrection
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning phrase-based spelling error models from clickthrough data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A large scale ranker-based system for search query spelling correction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Online spelling correction for query completion
Proceedings of the 20th international conference on World wide web
Web scale NLP: a case study on url word breaking
Proceedings of the 20th international conference on World wide web
Unsupervised query segmentation using clickthrough for information retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A non-learning approach to spelling correction in web queries
Proceedings of the 22nd international conference on World Wide Web companion
Mining search and browse logs for web search: A Survey
ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Hi-index | 0.00 |
Query spelling correction is a crucial component of modern search engines. Existing methods in the literature for search query spelling correction have two major drawbacks. First, they are unable to handle certain important types of spelling errors, such as concatenation and splitting. Second, they cannot efficiently evaluate all the candidate corrections due to the complex form of their scoring functions, and a heuristic filtering step must be applied to select a working set of top-K most promising candidates for final scoring, leading to non-optimal predictions. In this paper we address both limitations and propose a novel generalized Hidden Markov Model with discriminative training that can not only handle all the major types of spelling errors, including splitting and concatenation errors, in a single unified framework, but also efficiently evaluate all the candidate corrections to ensure the finding of a globally optimal correction. Experiments on two query spelling correction datasets demonstrate that the proposed generalized HMM is effective for correcting multiple types of spelling errors. The results also show that it significantly outperforms the current approach for generating top-K candidate corrections, making it a better first-stage filter to enable any other complex spelling correction algorithm to have access to a better working set of candidate corrections as well as to cover splitting and concatenation errors, which no existing method in academic literature can correct.