Learning bias and phonological-rule induction

  • Authors:
  • Daniel Gildea;Daniel Jurafsky

  • Affiliations:
  • University of California at Berkeley;University of Colorado at Boulder

  • Venue:
  • Computational Linguistics
  • Year:
  • 1996

Quantified Score

Hi-index 0.02

Visualization

Abstract

A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domain-independent learning rule (Error Back-Propagation, Instance-based Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data.In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English (SPE)-style phonological rules. We represent phonological rules as finite-state transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a general-purpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments tend to be realized similarly on the surface), community (similar segments behave similarly), and context (phonological rules need access to variable in their context). These biases are so fundamental to generative phonology that they are left implicit in many theories. But explicitly modifying the OSTIA algorithm with these biases allowed it to learn more compact, accurate, and general transducers, and our implementation successfully learns a number of rules from English and German. Furthermore, we show that some of the remaining errors in our augmented model are due to implicit biases in the traditional SPE-style rewrite system that are not similarly represented in the transducer formalism, suggesting that while transducers may be formally equivalent to SPE-style rules, they may not have identical evaluation procedures.Because our biases were applied to the learning of very simple SPE-style rules, and to a non-psychologically-motivated and nonprobabilistic theory of purely deterministic transducers, we do not expect that our model as implemented has any practical use as a phonological learning device, nor is it intended as a cognitive model of human learning. Indeed, because of the noise and nondeterminism inherent to linguistic data, we feel strongly that stochastic algorithms for language induction are much more likely to be a fruitful research direction. Our model is rather intended to suggest the kind of biases that may be added to other empiricist induction models, and the way in which they may be added, in order to build a cognitively and computationally plausible learning model for phonological rules.