Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Topics in computational hidden state modeling
Topics in computational hidden state modeling
The String-to-String Correction Problem
Journal of the ACM (JACM)
Computer programs for detecting and correcting spelling errors
Communications of the ACM
Computation of Normalized Edit Distance and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Optimal and Information Theoretic Syntactic Pattern Recognition for Traditional Errors
SSPR '96 Proceedings of the 6th International Workshop on Advances in Structural and Syntactical Pattern Recognition
Hierarchical non-emitting Markov models
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Improved string matching under noisy channel conditions
Proceedings of the tenth international conference on Information and knowledge management
Evidence Accumulation Clustering Based on the K-Means Algorithm
Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Partitional vs Hierarchical Clustering Using a Minimum Grammar Complexity Approach
Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
A New Cluster Isolation Criterion Based on Dissimilarity Increments
IEEE Transactions on Pattern Analysis and Machine Intelligence
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Grouping search-engine returned citations for person-name queries
Proceedings of the 6th annual ACM international workshop on Web information and data management
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Improving the performance of dictionary-based approaches in protein name recognition
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Multipath translation lexicon induction via bridge languages
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Backward machine transliteration by learning phonetic similarity
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Quality enhancement in information extraction from scanned documents
Proceedings of the 2006 ACM symposium on Document engineering
An approximate multi-word matching algorithm for robust document retrieval
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning a spelling error model from search query logs
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
The usability of passphrases for authentication: An empirical field study
International Journal of Human-Computer Studies
OCR error correction using a noisy channel model
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Case-factor diagrams for structured probabilistic modeling
Journal of Computer and System Sciences
A strategy for allowing meaningful and comparable scores in approximate matching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Robust symbolic representation for shape recognition and retrieval
Pattern Recognition
Robust symbolic representation for shape recognition and retrieval
Pattern Recognition
Learning probabilistic models of tree edit distance
Pattern Recognition
English-Arabic proper-noun transliteration-pairs creation
Journal of the American Society for Information Science and Technology
Learning Metrics Between Tree Structured Data: Application to Image Recognition
ECML '07 Proceedings of the 18th European conference on Machine Learning
SEDiL: Software for Edit Distance Learning
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Melody Recognition with Learned Edit Distances
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
A Stochastic Approach to Median String Computation
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Ordering the suggestions of a spellchecker without using context*
Natural Language Engineering
Fast error-tolerant search on very large texts
Proceedings of the 2009 ACM symposium on Applied Computing
Generalized Mongue-Elkan Method for Approximate Text String Comparison
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
CLHQS: Hierarchical Query Suggestion by Mining Clickthrough Log
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Trajectory representation using Gabor features for motion-based video retrieval
Pattern Recognition Letters
Shape and texture clustering: Best estimate for the clusters number
Image and Vision Computing
Adaptive string distance measures for bilingual dialect lexicon induction
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Phrase-based correction model for improving handwriting recognition accuracies
Pattern Recognition
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
Learnable similarity functions and their applications to clustering and record linkage
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Induction of cross-language affix and letter sequence correspondence
CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
Learning to match names across languages
MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Latent-variable modeling of string transductions with finite-state methods
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Abstractions in Process Mining: A Taxonomy of Patterns
BPM '09 Proceedings of the 7th International Conference on Business Process Management
Robust understanding in multimodal interfaces
Computational Linguistics
Unsupervised constraint driven learning for transliteration discovery
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Evaluation of several phonetic similarity algorithms on the task of cognate identification
LD '06 Proceedings of the Workshop on Linguistic Distances
Evaluating the pairwise string alignment of pronunciations
LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Effective spelling correction in web queries and run-time DB construction
Proceedings of the 2009 International Conference on Hybrid Information Technology
A global model for joint lemmatization and part-of-speech prediction
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Discriminative substring decoding for transliteration
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Modeling machine transliteration as a phrase based statistical machine translation problem
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Computing word similarity and identifying cognates with pair hidden Markov models
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Learning state machine-based string edit kernels
Pattern Recognition
Edit-distance of weighted automata
CIAA'02 Proceedings of the 7th international conference on Implementation and application of automata
Support vector training of protein alignment models
RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Graph-based tools for data mining and machine learning
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
A no-word-segmentation hierarchical clustering approach to Chinese web search results
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Sentence similarity measure based on events and content words
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
A workflow net similarity measure based on transition adjacency relations
Computers in Industry
Detecting duplicate biological entities using Shortest Path Edit Distance
International Journal of Data Mining and Bioinformatics
Linear frequency estimation technique for reducing frequency based signals
Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments
Shape recognition based on Kernel-edit distance
Computer Vision and Image Understanding
Finding similar failures using callstack similarity
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Data-driven computational linguistics at FaMAF-UNC, Argentina
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Transliteration generation and mining with limited training resources
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Semantic and phonetic automatic reconstruction of medical dictations
Computer Speech and Language
Efficient duplicate record detection based on similarity estimation
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Weighted symbols-based edit distance for string-structured image classification
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Automatically extracting information needs from complex clinical questions
Journal of Biomedical Informatics
Schema mapping with quality assurance for data integration
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
A fast and accurate method for approximate string search
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
How do you pronounce your name?: improving G2P with transliterations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Why press backspace?: understanding user input behaviors in Chinese Pinyin input method
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Finding unexpected navigation behaviour in clickstream data for website design improvement
Journal of Web Engineering
Discovering context: classifying tweets through a semantic transform based on wikipedia
FAC'11 Proceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems
A system for adaptive information extraction from highly informal text
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Learning good edit similarities with generalization guarantees
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Adjusting Fuzzy Similarity Functions for use with standard data mining tools
Journal of Systems and Software
On the usefulness of similarity based projection spaces for transfer learning
SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Unsupervised multilingual learning
Unsupervised multilingual learning
Levenshtein distances fail to identify language relationships accurately
Computational Linguistics
Edit distance for ordered vector sets: a case of study
SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Using learned conditional distributions as edit distance
SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Learning stochastic tree edit distance
ECML'06 Proceedings of the 17th European conference on Machine Learning
Graph matching – challenges and potential solutions
ICIAP'05 Proceedings of the 13th international conference on Image Analysis and Processing
Probabilistic iterative duplicate detection
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
A discriminative model of stochastic edit distance in the form of a conditional transducer
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Scoring matrices that induce metrics on sequences
LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
HMM-based ball hitting event exploration system for broadcast baseball video
Journal of Visual Communication and Image Representation
CHIME: an efficient error-tolerant Cinese pinyin input method
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Performance debugging in the large via mining millions of stack traces
Proceedings of the 34th International Conference on Software Engineering
Outline matching of the 2d shapes using extracting XML data
ICISP'12 Proceedings of the 5th international conference on Image and Signal Processing
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
Journal of Biomedical Informatics
Character-based pivot translation for under-resourced languages and domains
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Leveraging supplemental representations for sequential transduction
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Soft cardinality: a parameterized similarity function for text comparison
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Soft cardinality + ML: learning adaptive similarity functions for cross-lingual textual entailment
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Journal of Artificial Intelligence Research
Discriminative pronunciation modeling: a large-margin, feature-rich approach
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Source language adaptation for resource-poor machine translation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Name phylogeny: a generative model of string variation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Alignment-HMM-based extraction of abbreviations from biomedical text
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
NEWS '12 Proceedings of the 4th Named Entity Workshop
Social issue gives you an opportunity: discovering the personalised relevance of social issues
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Trying to outperform a well-known index with a sequential scan
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Generating service models by trace subsequence substitution
Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Model words-driven approaches for duplicate detection on the web
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Query representation for cross-temporal information retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A Bayesian Alignment Approach to Transliteration Mining
ACM Transactions on Asian Language Information Processing (TALIP)
Pattern Recognition Letters
Deduplicating a places database
Proceedings of the 23rd international conference on World wide web
Towards a Protein-Protein Interaction information extraction system: Recognizing named entities
Knowledge-Based Systems
Web Intelligence and Agent Systems
Hi-index | 0.14 |
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: The minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.