Towards a neurocomputational model of speech production and perception

Authors:
Bernd J. Kröger;Jim Kannampuzha;Christiane Neuschaefer-Rube
Affiliations:
Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and Aachen University, Aachen, Germany;Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and Aachen University, Aachen, Germany;Department of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and Aachen University, Aachen, Germany
Venue:
Speech Communication
Year:
2009

Citing 10
Cited 5

Learning to speak. Sensori-motor control of speech movements

Speech Communication - Special issue on speech production: models and data
On the complexity of learning for spiking neurons with temporal coding

Information and Computation
Self-organization of distributedly represented multiple behavior schemata in a mirror system: reviews of robot experiments using RNNPB

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Early lexical development in a self-organizing neural network

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Multisyn: Open-domain unit selection for the Festival speech synthesis system

Speech Communication
Reaching over the gap: A review of efforts to link human and automatic speech recognition research

Speech Communication
Automatic speech recognition and speech variability: A review

Speech Communication
A self-organizing neural model of motor equivalent reaching and tool use by a multijoint arm

Journal of Cognitive Neuroscience
A gesture-based concept for speech movement control in articulatory speech synthesis

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours
Simulation of Losses Due to Turbulence in the Time-Varying Vocal System

IEEE Transactions on Audio, Speech, and Language Processing

Categorical perception of consonants and vowels: evidence from a neurophonetic model of speech production and perception

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Audiovisual tools for phonetic and articulatory visualization in computer-aided pronunciation training

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Towards the acquisition of a sensorimotor vocal tract action repository within a neural model of speech processing

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
The neurophonetic model of speech processing ACT: structure, knowledge acquisition, and function modes

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
An adaptive neural control scheme for articulatory synthesis of CV sequences

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The limitation in performance of current speech synthesis and speech recognition systems may result from the fact that these systems are not designed with respect to the human neural processes of speech production and perception. A neurocomputational model of speech production and perception is introduced which is organized with respect to human neural processes of speech production and perception. The production-perception model comprises an artificial computer-implemented vocal tract as a front-end module, which is capable of generating articulatory speech movements and acoustic speech signals. The structure of the production-perception model comprises motor and sensory processing pathways. Speech knowledge is collected during training stages which imitate early stages of speech acquisition. This knowledge is stored in artificial self-organizing maps. The current neurocomputational model is capable of producing and perceiving vowels, VC-, and CV-syllables (V=vowels and C=voiced plosives). Basic features of natural speech production and perception are predicted from this model in a straight forward way: Production of speech items is feedforward and feedback controlled and phoneme realizations vary within perceptually defined regions. Perception is less categorical in the case of vowels in comparison to consonants. Due to its human-like production-perception processing the model should be discussed as a basic module for more technical relevant approaches for high-quality speech synthesis and for high performance speech recognition.