Evaluating the pairwise string alignment of pronunciations

  • Authors:
  • Martijn Wieling;Jelena Prokić;John Nerbonne

  • Affiliations:
  • University of Groningen, The Netherlands;University of Groningen, The Netherlands;University of Groningen, The Netherlands

  • Venue:
  • LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments of Bulgarian phonetic dialect data are used to compare four algorithms with a manually corrected gold standard. The algorithms evaluated include three variants of the Levenshtein algorithm as well as the Pair Hidden Markov Model. Our results show that while all algorithms perform very well and align around 95% of all alignments correctly, there are specific qualitative differences in the (mis)alignments of the different algorithms.