A pertinent learning machine input feature for speaker discrimination by voice

Authors:
S. Ouamour;H. Sayoud
Affiliations:
Institute of Electronics, USTHB University, Algiers, Algeria;Institute of Electronics, USTHB University, Algiers, Algeria
Venue:
International Journal of Speech Technology
Year:
2012

Citing 4
Cited 1

Application of multi-layer perceptron in estimating speech/noise characteristics for speech recognition in noisy environment

Speech Communication
Neural networks for discrimination and modelization of speakers

Speech Communication
Second-order statistical measures for text-independent speaker identification

Speech Communication
A wavelet-based parameterization for speech/music discrimination

Computer Speech and Language

A new approach of speaker clustering based on the stereophonic differential energy

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research work is a part of a global project of speech indexing entitled ISDS and concerns more particularly two machine learning classifier types: Neural Networks (NN) and Support Vector Machines (SVM), which are used by that project. However, in the present paper, we will only deal with the problem of speaker discrimination using a new relative reduced modelization for the speaker, restricting then our analysis to the new relative speaker characteristic used as input feature of the learning machines (NN and SVM). Speaker discrimination consists in checking whether two speech signals belong to the same speaker or not, by using some features of the speaker directly from his own speech. Our new proposed feature is based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC) and is well adapted for NN and SVM trainings. RSC consists in modeling one speaker relatively to another one, meaning that each speaker model is determined from both its speech signal and its dual speech. This investigation shows that the relative model, used as input of the classifier, optimizes the training, by speeding up the learning time and enhancing the discrimination accuracy of that classifier.Experiments of speaker discrimination are done on two different databases: Hub4 Broadcast-News database and a telephonic speech database, by using two learning machines: a Multi-Layer Perceptron (MLP) and a Support Vector Machines (SVM) with several input characteristics. Another comparative investigation is conducted by using two classical discriminative measures (Covariance-based mono-Gaussian distance and Kullback-Leibler distance) on the same databases.The originality of this relativist approach is that the new characteristic gives to the speaker a flexible model, since it changes every time that the competing speaker model changes. Results show that the new input characteristic is interesting in speaker discrimination. Furthermore, by using the Relative Speaker Characteristic, we reduce the size of the classifier input and the training time.