Analysis of multimodal sequences using geometric video representations
Signal Processing - Special section: Multimodal human-computer interfaces
Audiovisual speech synchrony measure: application to biometrics
EURASIP Journal on Applied Signal Processing
Computational Intelligence and Neuroscience - EEG/MEG Signal Processing
Audio-Visual Clustering for 3D Speaker Localization
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection and localization of 3d audio-visual objects using unsupervised clustering
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Visual speaker localization aided by acoustic models
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Visual localization of non-stationary sound sources
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Cross-modal localization through mutual information
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Recovery of audio-to-video synchronization through analysis of cross-modality correlation
Pattern Recognition Letters
Audio-visual identity verification: an introductory overview
Progress in nonlinear speech processing
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Conjugate mixture models for clustering multimodal data
Neural Computation
Finding audio-visual events in informal social gatherings
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Detecting motion synchrony by video tubes
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Learning multi-modal dictionaries: application to audiovisual data
MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security
A review on speaker diarization systems and approaches
Speech Communication
Hi-index | 0.00 |
Audio and visual signals arriving from a common source are detected using a signal-level fusion technique. A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence. Nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains. By comparing the mutual information between different pairs of signals, it is possible to identify which person is speaking a given utterance and discount errant motion or audio from other utterances or nonspeech events.