A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning

Authors:
Daniel Duran;Hinrich Schütze;Bernd Möbius;Michael Walsh
Affiliations:
Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany 70174;Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany 70174;Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany 70174;Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany 70174
Venue:
Research on Language and Computation
Year:
2010

Citing 8
Cited 1

Vector quantization and signal compression

Vector quantization and signal compression
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Unsupervised language acquisition

Unsupervised language acquisition
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Unsupervised learning of acoustic sub-word units

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Unsupervised learning of time-frequency patches as a noise-robust representation of speech

Speech Communication
Self-supervised acquisition of vowels in American English

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation

Computational modeling of phonetic and lexical learning in early language acquisition: Existing models and future directions

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we develop a new conceptual framework for an important problem in language acquisition, the correspondence problem: the fact that a given utterance has different manifestations in the speech and articulation of different speakers and that the correspondence of these manifestations is difficult to learn. We put forward the Correspondence-by-Segmentation Hypothesis, which states that correspondence is primarily learned by first segmenting speech in an unsupervised manner and then mapping the acoustics of different speakers onto each other. We show that a rudimentary segmentation of speech can be learned in an unsupervised fashion. We then demonstrate that, using the previously learned segmentation, different instances of a word can be mapped onto each other with high accuracy when trained on utterance-label pairs for a small set of words.