Detecting individual role using features extracted from speaker diarization results

  • Authors:
  • Benjamin Bigot;Isabelle Ferrané;Julien Pinquier;Régine André-Obrecht

  • Affiliations:
  • IRIT--Université de Toulouse, Cedex 09, France 31062;IRIT--Université de Toulouse, Cedex 09, France 31062;IRIT--Université de Toulouse, Cedex 09, France 31062;IRIT--Université de Toulouse, Cedex 09, France 31062

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the field of automatic audiovisual content-based indexing and structuring, finding events like interviews, debates, reports, or live commentaries requires to bridge the gap between low-level feature extraction and such high-level event detection. In our work, we consider that detecting speaker roles like Anchor, Journalist and Other is a first step to enrich interaction sequences between speakers. Our work relies on the assumption of the existence of clues about speaker roles in temporal, prosodic and basic signal features extracted from audio files and from speaker segmentations. Each speaker is therefore represented by a 36-feature vector. Contrarily to most of the state-of-the-art propositions we do not use the structure of the document to recognize the roles of the interveners. We investigate the influence of two dimensionality reduction techniques (Principal Component Analysis and Linear Discriminant Analysis) and different classification methods (Gaussian Mixture Models, K-nearest neighbours and Support Vectors Machines). Experiments are done on the 13-h corpus of the ESTER2 evaluation campaign. The best result reaches about 82% of well recognized roles. This corresponds to more than 89% of speech duration correctly labelled.