Principles of multivariate analysis: a user's perspective
Principles of multivariate analysis: a user's perspective
Open-vocabulary speech indexing for voice and video mail retrieval
MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Quantitative association of vocal-tract and facial behavior
Speech Communication - Special issue on auditory-visual speech processing
Omni-face detection for video/image content description
MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Classification of general audio data for content-based retrieval
Pattern Recognition Letters - Special issue on image/video indexing and retrieval
Audio-visual talking face detection
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Audio-visual synchrony for detection of monologues in video archives
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Using Multivariate Statistics (5th Edition)
Using Multivariate Statistics (5th Edition)
Who's that actor?: the InfoSip TV agent
ETP '03 Proceedings of the 2003 ACM SIGMM workshop on Experiential telepresence
Content-aware search of multimedia data in ad hoc networks
MSWiM '05 Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems
Similarity-based clustering strategy for mobile ad hoc multimedia databases
Mobile Information Systems
DSI: A model for distributed multimedia semantic indexing and content integration
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Visual query expansion via incremental hypernetwork models of image and text
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Probabilistic temporal multimedia data mining
ACM Transactions on Intelligent Systems and Technology (TIST)
Hybrid associative retrieval of three-dimensional models
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Multimedia data mining: state of the art and challenges
Multimedia Tools and Applications
A Bayesian network modeling approach for cross media analysis
Image Communication
Hierarchical semantic-based index for ad hoc image retrieval
Journal of Mobile Multimedia
Multimedia semantics integration using linguistic model
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Optimizing multimedia retrieval using multimodal fusion and relevance feedback techniques
MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
A cross-modal method of labeling music tags
Multimedia Tools and Applications
The acousticvisual emotion guassians model for automatic generation of music video
Proceedings of the 20th ACM international conference on Multimedia
Location-Aware Caching for Semantic-Based Image Queries in Mobile AD HOC Networks
International Journal of Multimedia Data Engineering & Management
High order pLSA for indexing tagged images
Signal Processing
A unified framework for multimodal retrieval
Pattern Recognition
Hi-index | 0.00 |
Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. We investigate different cross-modal association methods using the linear correlation model. We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use off-line supervised training. As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods achieve significant dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1% and 73.9% accuracy, respectively. As shown by experiments, cross-modal association provides many useful benefits, such as robust noise resistance and effective feature selection. Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage. Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications.