Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM

  • Authors:
  • Longbiao Wang;Norihide Kitaoka;Seiichi Nakagawa

  • Affiliations:
  • Department of Information and Computer Sciences, Toyohashi University of Technology, 1-1, Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Information and Computer Sciences, Toyohashi University of Technology, 1-1, Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan;Department of Information and Computer Sciences, Toyohashi University of Technology, 1-1, Hibarigaoka, Tempaku-cho, Toyohashi, Aichi 441-8580, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a robust speaker recognition method based on position-dependent Cepstral Mean Normalization (CMN) to compensate for the channel distortion depending on the speaker position. In the training stage, the system measures the transmission characteristics according to the speaker positions from some grid points to the microphone in the room and estimates the compensation parameters a priori. In the recognition stage, the system estimates the speaker position and adopts the estimated compensation parameters corresponding to the estimated position, and then the system applies the CMN to the speech and performs speaker recognition. In our past study, we proposed a new text-independent speaker recognition method by combining speaker-specific Gaussian mixture models (GMMs) with syllable-based HMMs adapted to the speakers by MAP [Nakagawa, S., Zhang, W., Takahashi, M., 2004. Text-independent speaker recognition by combining speaker-specific GMM with speaker-adapted syllable-based HMM. Proc. ICASSP-2004 1, 81-84]. The robustness of this speaker recognition method for the change of the speaking style in close-talking environment was evaluated in (Nakagawa et al., 2004). In this paper, we extend this combination method to distant speaker recognition and integrate this method with the proposed position-dependent CMN. Our experiments showed that the proposed method improved the speaker recognition performance remarkably in a distant environment.