Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification

Authors:
Mohamed Abdel Fattah;Fuji Ren;Shingo Kuroiwa
Affiliations:
The authors are with the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: mohafi@is.tokushima-u.ac.jp;The authors are with the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: mohafi@is.tokushima-u.ac.jp;The authors are with the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: mohafi@is.tokushima-u.ac.jp
Venue:
IEICE - Transactions on Information and Systems
Year:
2006

Citing 0
Cited 2

GA, MR, FFNN, PNN and GMM based models for automatic text summarization

Computer Speech and Language
Evaluation of EMD-based speaker recognition using ISCSLP2006 chinese speaker recognition evaluation corpus

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.