Detecting individual role using features extracted from speaker diarization results

Authors:
Benjamin Bigot;Isabelle Ferrané;Julien Pinquier;Régine André-Obrecht
Affiliations:
IRIT--Université de Toulouse, Cedex 09, France 31062;IRIT--Université de Toulouse, Cedex 09, France 31062;IRIT--Université de Toulouse, Cedex 09, France 31062;IRIT--Université de Toulouse, Cedex 09, France 31062
Venue:
Multimedia Tools and Applications
Year:
2012

Citing 14
Cited 0

Round Robin Rule Learning

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Unsupervised content discovery in composite audio

Proceedings of the 13th annual ACM international conference on Multimedia
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
You are what you say: using meeting participants' speech to detect their roles and expertise

ACTS '09 Proceedings of the HLT-NAACL 2006 Workshop on Analyzing Conversations in Text and Speech
Improved speaker diarization system for meetings

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Initial study on automatic identification of speaker role in broadcast news speech

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Locating case discussion segments in recorded medical team meetings

SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
Automatic role recognition in multiparty recordings using social networks and probabilistic sequential models

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Bridging the semantic gap in sports video retrieval and summarization

Journal of Visual Communication and Image Representation
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Narrowing the semantic gap - improved text-based web document retrieval using visual features

IEEE Transactions on Multimedia
Speakers Role Recognition in Multiparty Audio Recordings Using Social Network Analysis and Duration Distribution Modeling

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of automatic audiovisual content-based indexing and structuring, finding events like interviews, debates, reports, or live commentaries requires to bridge the gap between low-level feature extraction and such high-level event detection. In our work, we consider that detecting speaker roles like Anchor, Journalist and Other is a first step to enrich interaction sequences between speakers. Our work relies on the assumption of the existence of clues about speaker roles in temporal, prosodic and basic signal features extracted from audio files and from speaker segmentations. Each speaker is therefore represented by a 36-feature vector. Contrarily to most of the state-of-the-art propositions we do not use the structure of the document to recognize the roles of the interveners. We investigate the influence of two dimensionality reduction techniques (Principal Component Analysis and Linear Discriminant Analysis) and different classification methods (Gaussian Mixture Models, K-nearest neighbours and Support Vectors Machines). Experiments are done on the 13-h corpus of the ESTER2 evaluation campaign. The best result reaches about 82% of well recognized roles. This corresponds to more than 89% of speech duration correctly labelled.