Optimal multimodal fusion for multimedia data analysis

Authors:
Yi Wu;Edward Y. Chang;Kevin Chen-Chuan Chang;John R. Smith
Affiliations:
University of California Santa Barbara, Santa Barbara, CA;University of California Santa Barbara, Santa Barbara, CA;University of Illinois at Urbana-Champaign, Urbana, IL;IBM T.J. Watson Research Center, Hawthorne, NY
Venue:
Proceedings of the 12th annual ACM international conference on Multimedia
Year:
2004

Citing 13
Cited 65

An information-maximization approach to blind separation and blind deconvolution

Neural Computation
A fast fixed-point algorithm for independent component analysis

Neural Computation
Query by image and video content: the QBIC system

Intelligent multimedia information retrieval
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SVM binary classifier ensembles for image classification

Proceedings of the tenth international conference on Information and knowledge management
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Combining Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web

CBAIVL '98 Proceedings of the IEEE Workshop on Content - Based Access of Image and Video Libraries
Kernel independent component analysis

The Journal of Machine Learning Research
The combination limit in multimedia retrieval

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Combining Classifiers

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research
Issues in stacked generalization

Journal of Artificial Intelligence Research
Detection of documentary scene changes by audio-visual fusion

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval

The case for multi--tier camera sensor networks

NOSSDAV '05 Proceedings of the international workshop on Network and operating systems support for digital audio and video
Event-based multimedia chronicling systems

CARPE '05 Proceedings of the 2nd ACM workshop on Continuous archival and retrieval of personal experiences
Timeline-based information assimilation in multimedia surveillance and monitoring systems

Proceedings of the third ACM international workshop on Video surveillance & sensor networks
Multiple instance learning for labeling faces in broadcasting news video

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Graph based multi-modality learning

Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis

Proceedings of the 13th annual ACM international conference on Multimedia
Fusion of AV features and external information sources for event detection in team sports video

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Joint categorization of queries and clips for web-based video search

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Scalable relevance feedback using click-through data for web image retrieval

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Multimodal fusion using learned text concepts for image categorization

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Learning from facial aging patterns for automatic age estimation

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Semantics reinforcement and fusion learning for multimedia streams

Proceedings of the 6th ACM international conference on Image and video retrieval
Probabilistic model supported rank aggregation for the semantic concept detection in video

Proceedings of the 6th ACM international conference on Image and video retrieval
How many high-level concepts will fill the semantic gap in news video retrieval?

Proceedings of the 6th ACM international conference on Image and video retrieval
Combining multimodal preferences for multimedia information retrieval

Proceedings of the international workshop on Workshop on multimedia information retrieval
A review of text and image retrieval approaches for broadcast news video

Information Retrieval
Cross-domain video concept detection using adaptive svms

Proceedings of the 15th international conference on Multimedia
Optimizing multi-graph learning: towards a unified video annotation scheme

Proceedings of the 15th international conference on Multimedia
An integrated statistical model for multimedia evidence combination

Proceedings of the 15th international conference on Multimedia
Enhancing enterprise knowledge processes via cross-media extraction

Proceedings of the 4th international conference on Knowledge capture
Fuzzy integral based information fusion for classification of highly confusable non-speech sounds

Pattern Recognition
Fuzzy integral based information fusion for classification of highly confusable non-speech sounds

Pattern Recognition
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Integrating multi-modal content analysis and hyperbolic visualization for large-scale news video retrieval and exploration

Image Communication
Information Fusion in Multimedia Information Retrieval

Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Multimodal photo annotation and retrieval on a mobile phone

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Can feature information interaction help for information fusion in multimedia problems?

Multimedia Tools and Applications
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Video semantic analysis based on structure-sensitive anisotropic manifold ranking

Signal Processing
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
A unified relevance feedback framework for web image retrieval

IEEE Transactions on Image Processing
Concept-based evidential reasoning for multimodal fusion in human-computer interaction

Applied Soft Computing
Multi-view multi-label active learning for image classification

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Aggregative query generation

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Multimedia multimodal methodologies

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Using the geographic scopes of web documents for contextual advertising

Proceedings of the 6th Workshop on Geographic Information Retrieval
A Multi-Pronged Approach to Improving Semantic Extraction of News Video

Journal of Signal Processing Systems
Topic models for semantics-preserving video compression

Proceedings of the international conference on Multimedia information retrieval
Metric learning with feature decomposition for image categorization

Neurocomputing
Series feature aggregation for content-based image retrieval

Computers and Electrical Engineering
Multimodal information fusion application to human emotion recognition from face and speech

Multimedia Tools and Applications
Fusing visual and clinical information for lung tissue classification in high-resolution computed tomography

Artificial Intelligence in Medicine
Portfolio theory of multimedia fusion

Proceedings of the international conference on Multimedia
Interoperable and unified multimedia retrieval in distributed and heterogeneous environments

Proceedings of the international conference on Multimedia
A Bayesian network modeling approach for cross media analysis

Image Communication
Fusing heterogeneous modalities for video and image re-ranking

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Up-fusion: an evolving multimedia decision fusion method

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Improving the performance of acoustic event classification by selecting and combining information sources using the fuzzy integral

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
A unified dimensionality reduction framework for semi-paired and semi-supervised multi-view data

Pattern Recognition
Feature relationships hypergraph for multimodal recognition

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
Exploiting class-specific features in multi-feature dissimilarity space for efficient querying of images

FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Learning user queries in multimodal dissimilarity spaces

AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
A conversation with Dr. Edward Y. Chang

ACM SIGKDD Explorations Newsletter
A RELIEF-based modality weighting approach for multimodal information retrieval

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Leveraging high-level and low-level features for multimedia event detection

Proceedings of the 20th ACM international conference on Multimedia
A selective weighted late fusion for visual concept recognition

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme

Computer Vision and Image Understanding
A cross-media evolutionary timeline generation framework based on iterative recommendation

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
High order pLSA for indexing tagged images

Signal Processing
Web media semantic concept retrieval via tag removal and model fusion

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Video content categorization using the double decomposition

Multimedia Tools and Applications
Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval

Neurocomputing
Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

International Journal of Multimedia Data Engineering & Management
Interactive patent classification based on multi-classifier fusion and active learning

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Considerable research has been devoted to utilizing multimodal features for better understanding multimedia data. However, two core research issues have not yet been adequately addressed. First, given a set of features extracted from multiple media sources (e.g., extracted from the visual, audio, and caption track of videos), how do we determine the best modalities? Second, once a set of modalities has been identified, how do we best fuse them to map to semantics? In this paper, we propose a two-step approach. The first step finds statistically independent modalities from raw features. In the second step, we use super-kernel fusion to determine the optimal combination of individual modalities. We carefully analyze the tradeoffs between three design factors that affect fusion performance: modality independence, curse of dimensionality, and fusion-model complexity. Through analytical and empirical studies, we demonstrate that our two-step approach, which achieves a careful balance of the three design factors, can improve class-prediction accuracy over traditional techniques.