Three-dimensional computer vision: a geometric viewpoint
Three-dimensional computer vision: a geometric viewpoint
Integrating vision and touch for object recognition tasks
Multisensor integration and fusion for intelligent machines and systems
An introduction to variational methods for graphical models
Learning in graphical models
Multiple view geometry in computer vision
Multiple view geometry in computer vision
Mean Shift: A Robust Approach Toward Feature Space Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mobile Robot Localization and Map Building: A Multisensor Fusion Approach
Mobile Robot Localization and Map Building: A Multisensor Fusion Approach
Computer Vision: A Modern Approach
Computer Vision: A Modern Approach
A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
International Journal of Computer Vision
Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Audio-Visual Speaker Localization Using Graphical Models
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Approaches to Multisensor Data Fusion in Target Tracking: A Survey
IEEE Transactions on Knowledge and Data Engineering
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Mathematical Techniques in Multisensor Data Fusion (Artech House Information Warfare Library)
Mathematical Techniques in Multisensor Data Fusion (Artech House Information Warfare Library)
Noise adaptive stream weighting in audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
Dynamic Bayesian networks for audio-visual speech recognition
EURASIP Journal on Applied Signal Processing
The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Structure Inference for Bayesian Multisensory Scene Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-Sensor Data Fusion: An Introduction
Multi-Sensor Data Fusion: An Introduction
Patterns of binocular disparity for a fixating observer
BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
IEEE Transactions on Audio, Speech, and Language Processing
Speaker association with signal-level audiovisual fusion
IEEE Transactions on Multimedia
Finding audio-visual events in informal social gatherings
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
The cocktail party robot: sound source separation and localisation with an active binaural head
HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Social event detection using multimodal clustering and integrating supervisory signals
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Audio-visual robot command recognition: D-META'12 grand challenge
Proceedings of the 14th ACM international conference on Multimodal interaction
Hi-index | 0.00 |
The problem of multimodal clustering arises whenever the data are gathered with several physically different sensors. Observations from different modalities are not necessarily aligned in the sense there there is no obvious way to associate or compare them in some common space. A solution may consist in considering multiple clustering tasks independently for each modality. The main difficulty with such an approach is to guarantee that the unimodal clusterings are mutually consistent. In this letter, we show that multimodal clustering can be addressed within a novel framework: conjugate mixture models. These models exploit the explicit transformations that are often available between an unobserved parameter space (objects) and each of the observation spaces (sensors). We formulate the problem as a likelihood maximization task and derive the associated conjugate expectation-maximization algorithm. The convergence properties of the proposed algorithm are thoroughly investigated. Several local and global optimization techniques are proposed in order to increase its convergence speed. Two initialization strategies are proposed and compared. A consistent model selection criterion is proposed. The algorithm and its variants are tested and evaluated within the task of 3D localization of several speakers using both auditory and visual data.