A maximum entropy approach to natural language processing
Computational Linguistics
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Bitext maps and alignment via pattern recognition
Computational Linguistics
Applied morphological processing of English
Natural Language Engineering
Feature selection, L1 vs. L2 regularization, and rotational invariance
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Spelling correction in the PubMed search engine
Information Retrieval
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
OCR error correction using a noisy channel model
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Memory-Based Context-Sensitive Spelling Correction at Web Scale
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
Discovery of term variation in Japanese web search queries
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Learning phrase-based spelling error models from clickthrough data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Robust measurement and comparison of context similarity for finding translation pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Effective use of dependency structure for bilingual lexicon creation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
A fast and accurate method for approximate string search
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
String transformation, which maps a source string s into its desirable form t*, is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.