Applications of graph theory to an English rhyming corpus

  • Authors:
  • Morgan Sonderegger

  • Affiliations:
  • -

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: How much can we infer about the pronunciation of a language - past or present - by observing which words its speakers rhyme? This paper explores the connection between pronunciation and network structure in sets of rhymes. We consider the rhyme graphs corresponding to rhyming corpora, where nodes are words and edges are observed rhymes. We describe the graph G corresponding to a corpus of ~ 12000 rhymes from English poetry written c. 1900, and find a close correspondence between graph structure and pronunciation: most connected components show community structure that reflects the distinction between full and half rhymes. We build classifiers for predicting which components correspond to full rhymes, using a set of spectral and non-spectral features. Feature selection gives a small number (1-5) of spectral features, with accuracy and F-measure of ~90%, reflecting that positive components are essentially those without any good partition. We partition components of G via maximum modularity, giving a new graph, G', in which the ''quality'' of components, by several measures, is much higher than in G. We discuss how rhyme graphs could be used for historical pronunciation reconstruction.