Word association norms, mutual information, and lexicography
Computational Linguistics
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Extension of the String-to-String Correction Problem
Journal of the ACM (JACM)
A technique for computer detection and correction of spelling errors
Communications of the ACM
Computational dialectology in Irish Gaelic
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Inducing sound segment differences using Pair Hidden Markov Models
SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Evaluation of string distance algorithms for dialectology
LD '06 Proceedings of the Workshop on Linguistic Distances
Multiple sequence alignments in linguistics
LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Computing word similarity and identifying cognates with pair hidden Markov models
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Computer Speech and Language
Levenshtein distances fail to identify language relationships accurately
Computational Linguistics
Dealing with orthographic variation in a tagger-lemmatizer for fourteenth century Dutch charters
Language Resources and Evaluation
Hi-index | 0.00 |
Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments of Bulgarian phonetic dialect data are used to compare four algorithms with a manually corrected gold standard. The algorithms evaluated include three variants of the Levenshtein algorithm as well as the Pair Hidden Markov Model. Our results show that while all algorithms perform very well and align around 95% of all alignments correctly, there are specific qualitative differences in the (mis)alignments of the different algorithms.