Inducing sound segment differences using Pair Hidden Markov Models

Authors:
Martijn Wieling;Therese Leinonen;John Nerbonne
Affiliations:
University of Groningen;University of Groningen;University of Groningen
Venue:
SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Year:
2007

Citing 3
Cited 3

Computational dialectology in Irish Gaelic

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Evaluation of string distance algorithms for dialectology

LD '06 Proceedings of the Workshop on Linguistic Distances
Computing word similarity and identifying cognates with pair hidden Markov models

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Evaluating the pairwise string alignment of pronunciations

LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Transliteration system using pair HMM with weighted FSTs

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Mining transliterations from Wikipedia using pair HMMs

NEWS '10 Proceedings of the 2010 Named Entities Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pair Hidden Markov Models (PairHMMs) are trained to align the pronunciation transcriptions of a large contemporary collection of Dutch dialect material, the Goeman-Taeldeman-Van Reenen-Project (GTRP, collected 1980--1995). We focus on the question of how to incorporate information about sound segment distances to improve sequence distance measures for use in dialect comparison. PairHMMs induce segment distances via expectation maximisation (EM). Our analysis uses a phonologically comparable subset of 562 items for all 424 localities in the Netherlands. We evaluate the work first via comparison to analyses obtained using the Levenshtein distance on the same dataset and second, by comparing the quality of the induced vowel distances to acoustic differences.