Multimedia content processing through cross-modal association

Authors:
Dongge Li;Nevenka Dimitrova;Mingkun Li;Ishwar K. Sethi
Affiliations:
Motorola Labs, Schaumburg, Illinois;Philips Research, Briarcliff Manor, NY;Oakland University Rochester, MI;Oakland University Rochester, MI
Venue:
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Year:
2003

Citing 8
Cited 19

Principles of multivariate analysis: a user's perspective

Principles of multivariate analysis: a user's perspective
Open-vocabulary speech indexing for voice and video mail retrieval

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Quantitative association of vocal-tract and facial behavior

Speech Communication - Special issue on auditory-visual speech processing
Omni-face detection for video/image content description

MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Classification of general audio data for content-based retrieval

Pattern Recognition Letters - Special issue on image/video indexing and retrieval
Audio-visual talking face detection

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
Audio-visual synchrony for detection of monologues in video archives

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Using Multivariate Statistics (5th Edition)

Using Multivariate Statistics (5th Edition)

Who's that actor?: the InfoSip TV agent

ETP '03 Proceedings of the 2003 ACM SIGMM workshop on Experiential telepresence
Content-aware search of multimedia data in ad hoc networks

MSWiM '05 Proceedings of the 8th ACM international symposium on Modeling, analysis and simulation of wireless and mobile systems
Similarity-based clustering strategy for mobile ad hoc multimedia databases

Mobile Information Systems
DSI: A model for distributed multimedia semantic indexing and content integration

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Layered hypernetwork models for cross-modal associative text and image keyword generation in multimodal information retrieval

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Visual query expansion via incremental hypernetwork models of image and text

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Probabilistic temporal multimedia data mining

ACM Transactions on Intelligent Systems and Technology (TIST)
Hybrid associative retrieval of three-dimensional models

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Multimedia data mining: state of the art and challenges

Multimedia Tools and Applications
A Bayesian network modeling approach for cross media analysis

Image Communication
Hierarchical semantic-based index for ad hoc image retrieval

Journal of Mobile Multimedia
Multimedia semantics integration using linguistic model

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Optimizing multimedia retrieval using multimodal fusion and relevance feedback techniques

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
A cross-modal method of labeling music tags

Multimedia Tools and Applications
The acousticvisual emotion guassians model for automatic generation of music video

Proceedings of the 20th ACM international conference on Multimedia
Location-Aware Caching for Semantic-Based Image Queries in Mobile AD HOC Networks

International Journal of Multimedia Data Engineering & Management
Multimedia search and retrieval using multimodal annotation propagation and indexing techniques

Image Communication
High order pLSA for indexing tagged images

Signal Processing
A unified framework for multimodal retrieval

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. We investigate different cross-modal association methods using the linear correlation model. We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that use off-line supervised training. As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods achieve significant dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1% and 73.9% accuracy, respectively. As shown by experiments, cross-modal association provides many useful benefits, such as robust noise resistance and effective feature selection. Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage. Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications.