Hashing-based approaches to spelling correction of personal names

Authors:
Raghavendra Udupa;Shaishav Kumar
Affiliations:
Microsoft Research India, Bangalore, India;Microsoft Research India, Bangalore, India
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 21
Cited 5

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Tolerating spelling errors during patient validation

Computers and Biomedical Research
Retrieval effectiveness of proper name search methods

Information Processing and Management: an International Journal
The double metaphone search algorithm

C/C++ Users Journal
A technique for computer detection and correction of spelling errors

Communications of the ACM
Similarity estimation techniques from rounding algorithms

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Matchsimile: a flexible approximate matching tool for searching proper names

Journal of the American Society for Information Science and Technology
A spelling correction program based on a noisy channel model

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Pronunciation modeling for improved spelling correction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Learning a spelling error model from search query logs

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Introduction to Information Retrieval

Introduction to Information Retrieval
Semantic hashing

International Journal of Approximate Reasoning
Phonetic models for generating spelling variants

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Real-word spelling correction using Google web 1Tn-gram data set

Proceedings of the 18th ACM conference on Information and knowledge management
Using the web for language independent spellchecking and autocorrection

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Learning phrase-based spelling error models from clickthrough data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Improved transliteration mining using graph reinforcement

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning hash functions for cross-view similarity search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hybrid Matching Algorithm for Personal Names

Journal of Data and Information Quality (JDIQ)
Transliteration mining using large training and test sets

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A fast generative spell corrector based on edit distance

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose two hashing-based solutions to the problem of fast and effective personal names spelling correction in People Search applications. The key idea behind our methods is to learn hash functions that map similar names to similar (and compact) binary codewords. The two methods differ in the data they use for learning the hash functions - the first method uses a set of names in a given language/script whereas the second uses a set of bilingual names. We show that both methods give excellent retrieval performance in comparison to several baselines on two lists of misspelled personal names. More over, the method that uses bilingual data for learning hash functions gives the best performance.