Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

Authors:
Dalila Yessad;Abderrahmane Amrouche
Affiliations:
LCPTS, Speech Communication and Signal Processing Lab., Faculty of Electronics and Computer Sciences, USTHB, Bab Ezzouar, Alger 16111;LCPTS, Speech Communication and Signal Processing Lab., Faculty of Electronics and Computer Sciences, USTHB, Bab Ezzouar, Alger 16111
Venue:
International Journal of Speech Technology
Year:
2014

Citing 7
Cited 0

Speaker recognition using G.729 speech codec parameters

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Likelihood Ratio-Based Biometric Score Fusion

IEEE Transactions on Pattern Analysis and Machine Intelligence
An efficient speech recognition system in adverse conditions using the nonparametric regression

Engineering Applications of Artificial Intelligence
Score normalization in multimodal biometric systems

Pattern Recognition
MMSE-based packet loss concealment for CELP-coded speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Incorporating Model-Specific Score Distribution in Speaker Verification Systems

IEEE Transactions on Audio, Speech, and Language Processing
Speaker recognition from encrypted VoIP communications

Digital Investigation: The International Journal of Digital Forensics & Incident Response

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel approach, based on robust regression with normalized score fusion (namely Normalized Scores following Robust Regression Fusion: NSRRF), is proposed for enhancement of speaker recognition over IP networks, which can be used both in Network Speaker Recognition (NSR) and Distributed Speaker Recognition (DSR) systems. In this framework, it is basically assumed that the speech must be encoded by G729 coder in client side, and then, transmitted at a server side, where the ASR systems are located. The Universal Background Gaussian Mixture Model (GMM-UBM) and Gaussian Supervector (GMM-SVM) with normalized scores are used for speaker recognition. In this work, Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Cepstral Coefficient (LPCC), both of these features are derived from Line Spectral Pairs (LSP) extracted from G729 bit-stream over IP, constitute the features vectors. Experimental results, conducted with the LIA SpkDet system based on the ALIZE platform3 using ARADIGITS database, have shown in first that the proposed method using features extracted directly from G729 bit-stream reduces significantly the error rate and outperforms the baseline system in ASR over IP based on the resynthesized (reconstructed) speech obtained from the G729 decoder. In addition, the obtained results show that the proposed approach, based on scores normalization following robust regression fusion technique, achieves the best result and outperform the conventional ASR over IP network.