A new approach of speaker clustering based on the stereophonic differential energy

Authors:
S. Ouamour;H. Sayoud
Affiliations:
Electronics and Computer Engineering Institute, USTHB University, Alger, Algeria;Electronics and Computer Engineering Institute, USTHB University, Alger, Algeria
Venue:
International Journal of Speech Technology
Year:
2013

Citing 9
Cited 0

Second-order statistical measures for text-independent speaker identification

Speech Communication
DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Segregation of speakers for speech recognition and speaker identification

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
A Decision-Tree-Based Online Speaker Clustering

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation

Multimodal Technologies for Perception of Humans
A Novel Fuzzy-Based Automatic Speaker Clustering Algorithm

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Fusion of Acoustic and Prosodic Features for Speaker Clustering

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
A new speech recognition method based on VQ-distortion measure and HMM

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A pertinent learning machine input feature for speaker discrimination by voice

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new approach of speech clustering with regards of the speaker identity. It consists in grouping the homogeneous speech segments that are obtained at the end of the segmentation process, by using the spatial information provided by the stereophonic speech signals. The proposed method uses the differential energy of the two stereophonic signals collected by two cardioid microphones, in order to cluster all the speech segments that belong to the same speaker. The total number of clusters obtained at the end should be equal to the real number of speakers present in the meeting room and each cluster should contain the global intervention of only one speaker. The proposed system is suitable for debates or multi-conferences for which the speakers are located at fixed positions.Basically, our approach tries to make a speaker localization with regards to the position of the microphones, taken as a spatial reference. Based on this localization, the new proposed method can recognize the speaker identity of any speech segment during the meeting. So, the intervention of each speaker is automatically detected and assigned to him by estimating his relative position.In a purpose of comparison, two types of clustering methods have been implemented and experimented: the new approach, which we called Energy Differential based Spatial Clustering (EDSC) and a classical statistical approach called "Mono-Gaussian based Sequential Clustering" (MGSC).Experiments of speaker clustering are done on a stereophonic speech corpus called DB15, composed of 15 stereophonic scenarios of about 3.5 minutes each. Every scenario corresponds to a free discussion between two or three speakers seated at fixed positions in the meeting room.Results show the outstanding performances of the new approach in terms of precision and speed, especially for short speech segments, where most of clustering techniques present a strong failure.