Effectiveness of Teager energy operator for epoch detection from speech signals

Authors:
Hemant A. Patil;Srikant Viswanath
Affiliations:
Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India;Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India
Venue:
International Journal of Speech Technology
Year:
2011

Citing 6
Cited 0

Linear Prediction of Speech

Linear Prediction of Speech
Speech synthesis using an aeroacoustic fricative model

Speech synthesis using an aeroacoustic fricative model
Speech nonlinearities, modulations, and energy operators

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we try to present the problem of epoch detection from a different perspective that not only deals with estimation of epoch instances (i.e., glottal activity) but also with quantification of the absence of epochs (i.e., no glottal activity) in the unvoiced regions of speech signal. Most of the epoch detection methods perform significantly well in the voiced regions of speech but are not robust enough in the unvoiced regions of speech, i.e., they detect a number of pseudo epochs in the unvoiced regions of speech. We propose a simple method based on Teager Energy Operator (TEO) which not only determines the epochs in voiced region (due to its superior temporal resolution and its ability to capture airflow properties through the glottis) but also is very effective in unvoiced region. Recently proposed methods such as 0-Hz resonator-based method and DYPSA method gave a combined rate (CR) (for detecting epochs in voiced and unvoiced regions of speech) of 74.7% and 60%, respectively and a pseudo epoch rate (PER) (i.e., spurious epochs in the unvoiced regions of speech) of 62.9% and 54.04%, respectively. On the other hand, our proposed method gave a CR and PER of 87% and 0.27%, respectively. This result suggests that the proposed method captures glottal activity more efficiently both in voiced and unvoiced regions of speech signal. The performance of the proposed method is demonstrated using publicly available CMU-Arctic database using the epoch information from the electro-glottograph (EGG) as reference signal to serve as ground truth for estimation of glottal closure instants (GCI). Due to the noise suppression capability of TEO, the proposed method has almost no or little effect (i.e., robust) against signal degradations like white, babble, high frequency and vehicle noises as compared to 0-Hz resonator and DYPSA methods.