Time-frequency representations in speech perception

  • Authors:
  • Pedro Gómez-Vilda;José M. Ferrández-Vicente;Victoria Rodellar-Biarge;Roberto Fernández-Baíllo

  • Affiliations:
  • Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, s/n, 28660 Boadilla del Monte, Madrid, Spain;Universidad Politécnica de Cartagena, Campus Universitario Muralla del Mar, Pza. Hospital 1, 30202 Cartagena, Spain;Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, s/n, 28660 Boadilla del Monte, Madrid, Spain;Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, s/n, 28660 Boadilla del Monte, Madrid, Spain

  • Venue:
  • Neurocomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Nowadays applications demand a comprehensive view of voice and speech perception to build more complex and competitive procedures amenable of extracting as much knowledge from sound-based human communication as possible. Many knowledge-extraction tasks from speech and voice may share signal treatment procedures which can be devised under the point of view of bio-inspiration. The present paper examines a hierarchy of sound processing functionalities at the auditory and perceptual levels on the Auditory Neural pathways which can be translated into bio-inspired speech-processing techniques, their fundamental characteristics being analyzed in relation with current tendencies in cognitive audio processing. The pathways linking the peripheral auditory system (cochlear complex) with the brain cortex are briefly examined, with special attention to the study of neuronal structures showing specific capabilities under the point of view of formant analysis and the build-up of a semantic hierarchy from the time-frequency structure of speech to explore their capability of conveying semantics to speech processing and understanding from the minimal acoustic clues with elementary meaning or ''sematoms''. The replication of known biological functionality by algorithmic methods through bio-inspiration is a secondary aim of the research. Examples extracted from speech processing tasks in the domain of acoustic-phonetics are presented. These may find applicability in speech recognition, speaker's characterization and biometry, emotion detection, and others related.