Extracting Phonetic Knowledge from Learning Systems: Perceptrons, Support Vector Machines and Linear Discriminants

  • Authors:
  • Robert I. Damper;Steve R. Gunn;Mathew O. Gore

  • Affiliations:
  • Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK. rid@ecs.soton.ac.uk;Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK;Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK

  • Venue:
  • Applied Intelligence
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech perception relies on the human ability to decodecontinuous, analogue sound pressure waves into discrete, symbolic labels(‘phonemes’) with linguistic meaning. Aspects of thissignal-to-symbol transformation have been intensively studiedover many decades, usingpsychophysical procedures. The perception of (synthetic)syllable-initial stop consonants has been especially well studied, sincethese sounds display a marked categorization effect: they are typicallydichotomised into ‘voiced’ and ‘unvoiced’ classesaccording to theirvoice onset time (VOT). In this case, the category boundary is foundto have a systematic relation to the (simulated) place of articulation,but there is no currently-accepted explanation of this phenomenon.Categorization effects have now been demonstrated in a variety of animalspecies as well as humans, indicating that their origins lie in generalauditory and/or learning mechanisms, rather than in some ‘phoneticmodule’ specialized to human speech processing.In recent work, we have demonstrated that appropriately-trainedcomputational learning systems (‘neural networks’) alsodisplay the samesystematic behaviour as human and animal listeners. Networks aretrained on simulated patterns of auditory-nerve firings in response tosynthetic ‘continuua’ of stop-consonant/vowel syllablesvarying in placeof articulation and VOT. Unlike real listeners, such a software modelis amenable to analysis aimed at extracting the phonetic knowledgeacquired in training, so providing a putative explanation of thecategorization phenomenon. Here, we study three learning systems:single-layer perceptrons, support vector machines and Fisher lineardiscriminants. We highlight similarities and differences between theseapproaches. We find that the modern inductive inference technique forsmall sample sizes of support vector machines gives the most convincingresults. Knowledge extracted from the trained machine indicated thatthe phonetic percept of voicing is easily and directly recoverable fromauditory (but not acoustic) representations.