Exploring distributional similarity based models for query spelling correction

Authors:
Mu Li;Yang Zhang;Muhua Zhu;Ming Zhou
Affiliations:
Microsoft Research Asia, Haidian District, Beijing, China;Tianjin University, Tianjin, China;Northeastern University, Shenyang, Liaoning, China;Microsoft Research Asia, Haidian District, Beijing, China
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 14
Cited 31

Context based spelling correction

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
A maximum entropy approach to natural language processing

Computational Linguistics
A technique for computer detection and correction of spelling errors

Communications of the ACM
Automatic Rule Acquisition for Spelling Correction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning String Edit Distance

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A spelling correction program based on a noisy channel model

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Pronunciation modeling for improved spelling correction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
The role of documents vs. queries in extracting class attributes from text

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A unified and discriminative model for query refinement

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
OLAP on search logs: an infrastructure supporting data-driven applications in search engines

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A joint statistical model for simultaneous word spacing and spelling error correction for Korean

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
A survey of types of text noise and techniques to handle noisy text

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
A statistical comparison of tag and query logs

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A discriminative candidate generator for string transformations

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Japanese query alteration based on semantic similarity

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Effective spelling correction in web queries and run-time DB construction

Proceedings of the 2009 International Conference on Hybrid Information Technology
Discovery of term variation in Japanese web search queries

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Learning phrase-based spelling error models from clickthrough data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A structured approach to query recommendation with social annotation data

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A large scale ranker-based system for search query spelling correction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
An approach for adding noise-tolerance to restricted-domain information retrieval

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Learning similarity function for rare queries

Proceedings of the fourth ACM international conference on Web search and data mining
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
Online spelling correction for query completion

Proceedings of the 20th international conference on World wide web
A fast and accurate method for approximate string search

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A graph approach to spelling correction in domain-centric search

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Query log analysis with GALATEAS LangLog

EACL '12 Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Automatically constructing a normalisation dictionary for microblogs

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Fast multi-task learning for query spelling correction

Proceedings of the 21st ACM international conference on Information and knowledge management
Domain dependent query reformulation for web search

Proceedings of the 21st ACM international conference on Information and knowledge management
Lexical normalization for social media text

ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Efficient fuzzy search in large text collections

ACM Transactions on Information Systems (TOIS)
Speller performance prediction for query autocorrection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Probabilistic query rewriting for efficient and effective keyword search on graph data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

A query speller is crucial to search engine in improving web search relevance. This paper describes novel methods for use of distributional similarity estimated from query logs in learning improved query spelling correction models. The key to our methods is the property of distributional similarity between two terms: it is high between a frequently occurring misspelling and its correction, and low between two irrelevant terms only with similar spellings. We present two models that are able to take advantage of this property. Experimental results demonstrate that the distributional similarity based models can significantly outperform their baseline systems in the web query spelling correction task.