Recovering dialect geography from an unaligned comparable corpus

  • Authors:
  • Yves Scherrer

  • Affiliations:
  • LATL, Université de Genève, Geneva, Switzerland

  • Venue:
  • EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a simple metric of dialect distance, based on the ratio between identical word pairs and cognate word pairs occurring in two texts. Different variations of this metric are tested on a corpus containing comparable texts from different Swiss German dialects and evaluated on the basis of spatial autocorrelation measures. The visualization of the results as cluster dendrograms shows that closely related dialects are reliably clustered together, while multidimensional scaling produces graphs that show high agreement with the geographic localization of the original texts.