On pole-zero model estimation methods minimizing a logarithmic criterion for speech analysis

Authors:
Damián Marelli;Peter Balazs
Affiliations:
School of Electrical Engineering and Computer Science, University of Newcastle, Callaghan, NSW, Australia;Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 10
Cited 0

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
Fundamentals of speech recognition

Fundamentals of speech recognition
Fundamentals of speech synthesis and speech recognition: basic concepts, state-of-the-art and future challenges

Fundamentals of speech synthesis and speech recognition: basic concepts, state-of-the-art and future challenges
Digital signal processing (3rd ed.): principles, algorithms, and applications

Digital signal processing (3rd ed.): principles, algorithms, and applications
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Linear Prediction of Speech

Linear Prediction of Speech
Accurate ARMA models with Durbin's second method

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 03
Automatic speech recognition and speech variability: A review

Speech Communication
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Comments on “On the design of pole-zero approximations usinga logarithmic error measure”

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A speech production model consists of a linear, slowly time-varying filter. Pole-zero models are required for a good representation of certain types of speech sounds, like nasals and laterals. From a perceptual point of view, designing them by minimizing a logarithmic criterion appears as a very suitable approach. The most accurate available results are obtained by using Newton-like search algorithms to optimize pole and zero positions, or the coefficients of a decomposition into quadratic factors. In this paper, we propose to optimize the numerator and denominator coefficients instead. Experimental results show that this is the computationally most efficient approach, especially when the optimization criterion considers a psychoacoustical frequency scale. To illustrate its applicability in speech processing, we used the proposed method for formant and anti-formant tracking as well as speech resynthesis.