Evaluating the pairwise string alignment of pronunciations

Authors:
Martijn Wieling;Jelena Prokić;John Nerbonne
Affiliations:
University of Groningen, The Netherlands;University of Groningen, The Netherlands;University of Groningen, The Netherlands
Venue:
LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Year:
2009

Citing 9
Cited 5

Word association norms, mutual information, and lexicography

Computational Linguistics
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Extension of the String-to-String Correction Problem

Journal of the ACM (JACM)
A technique for computer detection and correction of spelling errors

Communications of the ACM
Computational dialectology in Irish Gaelic

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Inducing sound segment differences using Pair Hidden Markov Models

SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Evaluation of string distance algorithms for dialectology

LD '06 Proceedings of the Workshop on Linguistic Distances
Multiple sequence alignments in linguistics

LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Computing word similarity and identifying cognates with pair hidden Markov models

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Hierarchical spectral partitioning of bipartite graphs to cluster dialects and identify distinguishing features

TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features

Computer Speech and Language
Levenshtein distances fail to identify language relationships accurately

Computational Linguistics
Dealing with orthographic variation in a tagger-lemmatizer for fourteenth century Dutch charters

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments of Bulgarian phonetic dialect data are used to compare four algorithms with a manually corrected gold standard. The algorithms evaluated include three variants of the Levenshtein algorithm as well as the Pair Hidden Markov Model. Our results show that while all algorithms perform very well and align around 95% of all alignments correctly, there are specific qualitative differences in the (mis)alignments of the different algorithms.