Elements of information theory
Elements of information theory
Discrete Time Processing of Speech Signals
Discrete Time Processing of Speech Signals
Nonlinear filtering for speaker tracking in noisy and reverberant environments
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 05
Particle filter with integrated voice activity detection for acoustic source tracking
EURASIP Journal on Applied Signal Processing
Using information theory to detect voice activity
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Tracking intermittently speaking multiple speakers using a particle filter
EURASIP Journal on Audio, Speech, and Music Processing
Evolutionary optimization of dynamics models in sequential Monte Carlo target tracking
IEEE Transactions on Evolutionary Computation
Audio-visual active speaker tracking in cluttered indoors environments
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
A tutorial on particle filters for online nonlinear/non-GaussianBayesian tracking
IEEE Transactions on Signal Processing
Hi-index | 0.00 |
The problem of detecting the location of an active acoustic source in an enclosure remains subject to a series of difficulties. Algorithms operate repeatedly on small frames of data from microphone recordings and provide estimates of the current source location. In a typical room, the quality of these recordings is affected by noise and reverberation. Additionally, the presence of silence gaps in speech signals and possible competing speakers can reduce tracking accuracy further. We discuss a novel localization and tracking framework that is based on particle filtering. This is driven by detection methods based on information theory that remain robust under reverberant and noisy environments. Integrating a second particle filter allows the system to track interchanging acoustic sources that reside far apart. A further extension involves the integration of a voice activity detection scheme that uses the same detection measures and deals with human-speech gaps. Performance is first examined using simulations that parameterize results according to environmental variables like reverberation and system geometry. The system is then used in a real-world scenario with data from multi-person meetings. Results indicate that the proposed framework outperforms all systems used for comparison in this work while remaining adequately robust in the examined environments.