Speaker Clustering Aided by Visual Dialogue Analysis

Authors:
Shuang Zhang;Wei Hu;Tao Wang;Jia Liu;Yimin Zhang
Affiliations:
Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing, China 100084;Intel China Research Center, Beijing, P.R. China;Intel China Research Center, Beijing, P.R. China;Tsinghua National Laboratory for Information Science and Technology Department of Electronic Engineering, Tsinghua University, Beijing, China 100084;Intel China Research Center, Beijing, P.R. China
Venue:
PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Year:
2008

Citing 3
Cited 0

DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Audio Data Indexing: Use of Second-Order Statistics for Speaker-Based Segmentation

ICMCS '99 Proceedings of the 1999 IEEE International Conference on Multimedia Computing and Systems - Volume 02
Dialogue sequence detection in movies

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker clustering aims to automatically cluster speech segments for each speaker. By speaker clustering, we can discover main cast list from long videos and retrieve their relevant video clips for efficient browsing. In this paper, we propose a dialogue supervised speaker clustering method, which makes use of the visual dialogue analysis results to improve the performance of speaker clustering. Compared with the traditional approach based only on acoustic features, the dialogue supervised speaker clustering approach can get significant improvement on the clustering result for movie and TV series.