Voice activity detection from gaze in video mediated communication

Authors:
Michal Hradis;Shahram Eivazi;Roman Bednarik
Affiliations:
Brno University of Technology;University of Eastern Finland;University of Eastern Finland
Venue:
Proceedings of the Symposium on Eye Tracking Research and Applications
Year:
2012

Citing 7
Cited 4

Support-Vector Networks

Machine Learning
Identifying fixations and saccades in eye-tracking protocols

ETRA '00 Proceedings of the 2000 symposium on Eye tracking research & applications
Gaze and Gesture Activity in Communication

UAHCI '09 Proceedings of the 5th International on ConferenceUniversal Access in Human-Computer Interaction. Part II: Intelligent and Ubiquitous Interaction Environments
Differences in head orientation behavior for speakers and listeners: An experiment in a virtual environment

ACM Transactions on Applied Perception (TAP)
Eye-gaze experiments for conversation monitoring

Proceedings of the 3rd International Universal Communication Symposium
Reasoning for video-mediated group communication

ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Enabling Composition-Based Video-Conferencing for the Home

IEEE Transactions on Multimedia

Hard lessons learned: mobile eye-tracking in cockpits

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction
Gaze and conversational engagement in multiparty video conversation: an annotation scheme and classification of high and low levels of engagement

Proceedings of the 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction
Computational approaches to visual attention for interaction inference

Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion
Inferential methods in interaction, usability and user experience

CHI '13 Extended Abstracts on Human Factors in Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses estimation of active speaker in multi-party video-mediated communication from gaze data of one of the participants. In the explored settings, we predict voice activity of participants in one room based on gaze recordings of a single participant in another room. The two rooms were connected by high definition, low delay audio and video links and the participants engaged in different activities ranging from casual discussion to simple problem-solving games. We treat the task as a classification problem. We evaluate several types of features and parameter settings in the context of Support Vector Machine classification framework. The results show that using the proposed approach vocal activity of a speaker can be correctly predicted in 89 % of the time for which the gaze data are available.