Pitch-based gender identification with two-stage classification

Authors:
Yakun Hu;Dapeng Wu;Antonio Nucci
Affiliations:
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, U.S.A.;Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, U.S.A.;Narus, Inc., 570 Maude Court, Sunnyvale, CA 94085, U.S.A.
Venue:
Security and Communication Networks
Year:
2012

Citing 8
Cited 0

Digital Speech; Coding for Low Bit Rate Communication Systems

Digital Speech; Coding for Low Bit Rate Communication Systems
Voice-based gender identification in multimedia applications

Journal of Intelligent Information Systems - Special issue: Intelligent multimedia applications
Speaker and gender normalization for continuous-density hidden Markov models

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Language independent gender identification

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Energy Estimation between Adjacent Formant Frequencies to Identify Speaker's Gender

ITNG '08 Proceedings of the Fifth International Conference on Information Technology: New Generations
Statistical analysis of amplitude modulation in speech signals using an AM-FM model

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
On amplitude and frequency demodulation using energy operators

IEEE Transactions on Signal Processing
Energy separation in signal modulations with application to speechanalysis

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the speech-based gender identification problem. Mel-Frequency Cepstral Coefficients (MFCC) of voice samples are typically used as the features for gender identification. However, MFCC-based classification incurs high complexity. This paper proposes a novel pitch-based gender identification system with a two-stage classifier to ensure accurate identification and low complexity. The first stage of the classifier identifies and labels all the speakers whose pitch clearly indicates the gender of the speaker; the complexity of this stage is very low since only threshold-based decision rule on a scalar (i.e., pitch) is used. The ambiguous voice samples from all the other speakers (which cannot be classified with high accuracy by the first stage, and can be regarded as suspicious speakers or difficult cases) are forwarded to the second-stage for finer examination; the second-stage of our classifier uses Gaussian Mixture Model to accurately isolate voice samples based on gender. Experiment results show that our system is speech language/content independent, microphone independent, and robust against noisy recording conditions. Our system is extremely accurate with probability of correct classification of 98.65%, and very efficient with about 5 s required for feature extraction and classification. Copyright © 2011 John Wiley & Sons, Ltd.