Speaker localization in CHIL lectures: evaluation criteria and results

Authors:
Maurizio Omologo;Piergiorgio Svaizer;Alessio Brutti;Luca Cristoforetti
Affiliations:
ITC-irst, Povo, Trento, Italy;ITC-irst, Povo, Trento, Italy;ITC-irst, Povo, Trento, Italy;ITC-irst, Povo, Trento, Italy
Venue:
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Year:
2005

Citing 6
Cited 5

Microphone arrays for video camera steering

Acoustic signal processing for telecommunication
Spoken Dialogues with Computers

Spoken Dialogues with Computers
Voice Source Localization for Automatic Camera Pointing System in Videoconferencing

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Towards Vision-Based 3-D People Tracking in a Smart Room

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Acoustic source location in noisy and reverberant environment using CSP analysis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Localization of multiple sound sources based on a CSP analysis with a microphone array

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

A Person Tracking System for CHIL Meetings

Multimodal Technologies for Perception of Humans
A speaker diarization method based on the probabilistic fusion of audio-visual location information

Proceedings of the 2009 international conference on Multimodal interfaces
Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude

IEEE Transactions on Audio, Speech, and Language Processing
A video monitoring model with a distributed camera system for the smart space

ruSMART/NEW2AN'10 Proceedings of the Third conference on Smart Spaces and next generation wired, and 10th international conference on Wireless networking
Sound source localization for real-world humanoid robots

SITE'12 Proceedings of the 11th international conference on Telecommunications and Informatics, Proceedings of the 11th international conference on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work addresses the problem of automatic speaker localization and tracking in a real lecture scenario. Evaluation criteria recently adopted under CHIL and NIST benchmarking are outlined. Two speaker localization systems are described, which are based on the use of Generalized Cross Correlation Phase Transform analysis and Global Coherence Field. Benchmarking results, obtained on a set of 13 lectures, showed an average RMS error of about 30 cm in the speaker localization.