Evaluation of string distance algorithms for dialectology

  • Authors:
  • Wilbert Heeringa;Peter Kleiweg;Charlotte Gooskens;John Nerbonne

  • Affiliations:
  • University of Groningen;University of Groningen;University of Groningen;University of Groningen

  • Venue:
  • LD '06 Proceedings of the Workshop on Linguistic Distances
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-grams--although we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances.