Investigating the Relationship Between Linguistic Representation and Computation through an Unsupervised Model of Human Morphology Learning

  • Authors:
  • Erwin Chan;Constantine Lignos

  • Affiliations:
  • University of Arizona, Tucson, USA;University of Pennsylvania, Philadelphia, USA

  • Venue:
  • Research on Language and Computation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop an unsupervised algorithm for morphological acquisition to investigate the relationship between linguistic representation, data statistics, and learning algorithms. We model the phenomenon that children acquire the morphological inflections of a language monotonically by introducing an algorithm that uses a bootstrapped, frequency-driven learning procedure to acquire rules monotonically. The algorithm learns a morphological grammar in terms of a Base and Transforms representation, a simple rule-based model of morphology. When tested on corpora of child-directed speech in English from CHILDES (MacWhinney in The CHILDES-Project: Tools for analyzing talk. Erlbaum, Hillsdale, 2000), the algorithm learns the most salient rules of English morphology and the order of acquisition is similar to that of children as observed by Brown (A first language: the early stages. Harvard University Press, Cambridge, 1973). Investigations of statistical distributions in corpora reveal that the algorithm is able to acquire morphological grammars due to its exploitation of Zipfian distributions in morphology through type-frequency statistics. These investigations suggest that the computation and frequency-driven selection of discrete morphological rules may be important factors in children's acquisition of basic inflectional morphological systems.