Dominant speaker identification for multipoint videoconferencing

Authors:
Ilana Volfin;Israel Cohen
Affiliations:
Department of Electrical Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel;Department of Electrical Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
Venue:
Computer Speech and Language
Year:
2013

Citing 3
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Linear PCM Signal Processing for Audio Processing Unit in Multipoint Video Conferencing System

ISCC '98 Proceedings of the Third IEEE Symposium on Computers & Communications
AR-GARCH in Presence of Noise: Parameter Estimation and Its Application to Voice Activity Detection

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multi-point conference is an efficient and cost effective substitute for a face to face meeting. It involves three or more participants placed in separate locations, where each participant employs a single microphone and camera. The routing and processing of the audiovisual information is very demanding on the network. This raises a need for reducing the amount of information that flows through the system. One solution is to identify the dominant speaker and partially discard information originating from non-active participants. We propose a novel method for dominant speaker identification using speech activity information from time intervals of different lengths. The proposed method processes the audio signal of each participant independently and computes speech activity scores for the immediate, medium and long time-intervals. These scores are compared and the dominant speaker is identified. In comparison to other speaker selection methods, experimental results demonstrate reduction in the number of false speaker switches and improved robustness to transient audio interferences.