On speech coding in a perceptual domain

Authors:
G. Kubin;W. Bastiaan Kleijn
Affiliations:
Tech. Univ. Wien, Austria;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 5

Anthropomorphic coding of speech and audio: a model inversion approach

EURASIP Journal on Applied Signal Processing
Monophonic sound source separation with an unsupervised network of spiking neurones

Neurocomputing
On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR

IEEE Transactions on Audio, Speech, and Language Processing
A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech

Speech Communication
Magnitude-Sign split quantization for bandwidth scalable wideband speech codec

PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

For speech coders which fall within the class of waveform coders, the reconstructed signal approaches the original with increasing bit rate. In such coders, the distortion criterion generally operates on the speech signal or a signal obtained by adaptive linear filtering of the speech signal. To satisfy computational and delay constraints, the distortion criterion must be reduced to a very simple approximation of the auditory system. This drawback of conventional approaches motivates a new speech coding paradigm in which the coding is performed in a domain where the single-letter squared-error criterion forms an accurate representation of perception. The new paradigm requires a model of the auditory periphery which is accurate, can be be inverted with relatively low computational effort, and which represents the signal with relatively few parameters. We develop such a model of the auditory periphery and discuss its suitability for speech coding. The results indicate that the new paradigm in general and our auditory model in particular form a promising basis for the coding of both speech and audio at low bit rates.