ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Unsupervised content discovery in composite audio
Proceedings of the 13th annual ACM international conference on Multimedia
Extracting product features and opinions from reviews
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
You are what you say: using meeting participants' speech to detect their roles and expertise
ACTS '09 Proceedings of the HLT-NAACL 2006 Workshop on Analyzing Conversations in Text and Speech
Improved speaker diarization system for meetings
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Initial study on automatic identification of speaker role in broadcast news speech
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Locating case discussion segments in recorded medical team meetings
SSCS '09 Proceedings of the third workshop on Searching spontaneous conversational speech
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Bridging the semantic gap in sports video retrieval and summarization
Journal of Visual Communication and Image Representation
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Narrowing the semantic gap - improved text-based web document retrieval using visual features
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia
Hi-index | 0.00 |
In the field of automatic audiovisual content-based indexing and structuring, finding events like interviews, debates, reports, or live commentaries requires to bridge the gap between low-level feature extraction and such high-level event detection. In our work, we consider that detecting speaker roles like Anchor, Journalist and Other is a first step to enrich interaction sequences between speakers. Our work relies on the assumption of the existence of clues about speaker roles in temporal, prosodic and basic signal features extracted from audio files and from speaker segmentations. Each speaker is therefore represented by a 36-feature vector. Contrarily to most of the state-of-the-art propositions we do not use the structure of the document to recognize the roles of the interveners. We investigate the influence of two dimensionality reduction techniques (Principal Component Analysis and Linear Discriminant Analysis) and different classification methods (Gaussian Mixture Models, K-nearest neighbours and Support Vectors Machines). Experiments are done on the 13-h corpus of the ESTER2 evaluation campaign. The best result reaches about 82% of well recognized roles. This corresponds to more than 89% of speech duration correctly labelled.