Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

Authors:
Rong Yan;Marc-Olivier Fleury;Michele Merler;Apostol Natsev;John R. Smith
Affiliations:
IBM Research, Hawthorne, NY, USA;EPFL, Lausanne, Switzerland;Columbia University, New York, NY, USA;IBM Research, Hawthorne, NY, USA;IBM Research, Hawthorne, NY, USA
Venue:
LS-MMRM '09 Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
Year:
2009

Citing 11
Cited 9

Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
Random Forests

Machine Learning
Ensemble selection from libraries of models

ICML '04 Proceedings of the twenty-first international conference on Machine learning
On the detection of semantic concepts at TRECVID

Proceedings of the 12th annual ACM international conference on Multimedia
Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pruning in ordered bagging ensembles

ICML '06 Proceedings of the 23rd international conference on Machine learning
Model-shared subspace boosting for multi-label classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards optimal bag-of-features for object categorization and semantic video retrieval

Proceedings of the 6th ACM international conference on Image and video retrieval
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008

Web-scale computer vision using MapReduce for multimedia data mining

Proceedings of the Tenth International Workshop on Multimedia Data Mining
Large-scale multimodal mining for healthcare with mapreduce

Proceedings of the 1st ACM International Health Informatics Symposium
Semantic analysis and retrieval in personal and social photo collections

Multimedia Tools and Applications
Concept modeling: From origins to multimedia

Multimedia Tools and Applications
Lookapp: interactive construction of web-based concept detectors

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Assistive tagging: A survey of multimedia tagging with human-computer joint exploration

ACM Computing Surveys (CSUR)
Multimedia Applications and Security in MapReduce: Opportunities and Challenges

Concurrency and Computation: Practice & Experience
Riding the multimedia big data wave

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Massive-scale multimedia semantic modeling

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.01

Visualization

Abstract

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose the robust subspace bagging (RB-SBag) algorithm by augmenting random subspace bagging with forward model selection. Compared with traditional modeling approaches, RB-SBag offers a considerably faster learning process while minimizing the risk of overfitting. Its ensemble structure also enables a convenient transformation into a simple parallel framework called MapReduce. To further improve scalability, we also develop a task scheduling algorithm to optimize task placement for heterogenous tasks. On a collection consisting of more than 250,000 images and several standard TRECVID benchmark datasets, RB-SBag achieved more than a 10-fold speedup with comparable or even better classification performance than baseline SVMs. We also deployed the MapReduce implementation on a 16-node Hadoop cluster, where the proposed task scheduler demonstrates a significantly better scalability than the baseline scheduler in the presence of task heterogeneity.