Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM

Authors:
Longbiao Wang;Norihide Kitaoka;Seiichi Nakagawa
Affiliations:
Department of Information and Computer Sciences, Toyohashi University of Technology, 1-1, Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Information and Computer Sciences, Toyohashi University of Technology, 1-1, Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Information and Computer Sciences, Toyohashi University of Technology, 1-1, Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan
Venue:
Speech Communication
Year:
2007

Citing 6
Cited 1

Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
A framework for speech source localization using sensor arrays

A framework for speech source localization using sensor arrays
Efficient cepstral normalization for robust speech recognition

HLT '93 Proceedings of the workshop on Human Language Technology
Text-Independent/Text-Prompted Speaker Recognition by Combining Speaker-Specific GMM with Speaker Adapted Syllable-Based HMM

IEICE - Transactions on Information and Systems
Acoustic source location in noisy and reverberant environment using CSP analysis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Robust adaptive time delay estimation for speaker localization in noisy and reverberant acoustic environments

EURASIP Journal on Applied Signal Processing

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

IEICE - Transactions on Information and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a robust speaker recognition method based on position-dependent Cepstral Mean Normalization (CMN) to compensate for the channel distortion depending on the speaker position. In the training stage, the system measures the transmission characteristics according to the speaker positions from some grid points to the microphone in the room and estimates the compensation parameters a priori. In the recognition stage, the system estimates the speaker position and adopts the estimated compensation parameters corresponding to the estimated position, and then the system applies the CMN to the speech and performs speaker recognition. In our past study, we proposed a new text-independent speaker recognition method by combining speaker-specific Gaussian mixture models (GMMs) with syllable-based HMMs adapted to the speakers by MAP [Nakagawa, S., Zhang, W., Takahashi, M., 2004. Text-independent speaker recognition by combining speaker-specific GMM with speaker-adapted syllable-based HMM. Proc. ICASSP-2004 1, 81-84]. The robustness of this speaker recognition method for the change of the speaking style in close-talking environment was evaluated in (Nakagawa et al., 2004). In this paper, we extend this combination method to distant speaker recognition and integrate this method with the proposed position-dependent CMN. Our experiments showed that the proposed method improved the speaker recognition performance remarkably in a distant environment.