Modulation spectral features for robust far-field speaker identification

Authors:
Tiago H. Falk;Wai-Yip Chan
Affiliations:
Bloorview Kids Rehab, Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada and Department of Electrical and Computer Engineering, Queen's University, Ki ...;Department of Electrical and Computer Engineering, Queen's University, Kingston, ON, Canada
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 8
Cited 12

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
A microphone array processing technique for speech enhancement in a reverberant space

Speech Communication
Speaker recognition in reverberant enclosures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Modulation enhancement of speech as a preprocessing for reverberant chambers with the hearing-impaired

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System

IEEE Transactions on Audio, Speech, and Language Processing
Far-Field Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Speaker Identification Using Instantaneous Frequencies

IEEE Transactions on Audio, Speech, and Language Processing
Robust Speaker Recognition in Noisy Conditions

IEEE Transactions on Audio, Speech, and Language Processing

A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Role of modulation magnitude and phase spectrum towards speech intelligibility

Speech Communication
Employing second-order circular suprasegmental hidden Markov models to enhance speaker identification performance in shouted talking environments

EURASIP Journal on Audio, Speech, and Music Processing
Automatic speech emotion recognition using modulation spectral features

Speech Communication
Identifying speakers using their emotion cues

International Journal of Speech Technology
Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator

Speech Communication
Disordered voice measurement and auditory analysis

Speech Communication
Speaker verification using excitation source information

International Journal of Speech Technology
Enhancing robustness for speech recognition through bio-inspired auditory filter-bank

International Journal of Bio-Inspired Computation
Gender-dependent emotion recognition based on HMMs and SPHMMs

International Journal of Speech Technology
Objective speech intelligibility measurement for cochlear implant users in complex listening environments

Speech Communication
Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, auditory inspired modulation spectral features are used to improve automatic speaker identification (ASI) performance in the presence of room reverberation. The modulation spectral signal representation is obtained by first filtering the speech signal with a 23-channel gammatone filterbank. An eight-channel modulation filterbank is then applied to the temporal envelope of each gammatone filter output. Features are extracted from modulation frequency bands ranging from 3-15 Hz and are shown to be robust to mismatch between training and testing conditions and to increasing reverberation levels. To demonstrate the gains obtained with the proposed features, experiments are performed with clean speech, artificially generated reverberant speech, and reverberant speech recorded in a meeting room. Simulation results show that a Gaussian mixture model based ASI system, trained on the proposed features, consistently outperforms a baseline system trained on mel-frequency cepstral coefficients. For multimicrophone ASI applications, three multichannel score combination and adaptive channel selection techniques are investigated and shown to further improve ASI performance.