An approach for processing large and non-uniform media objects on mapreduce-based clusters

Authors:
Rainer Schmidt;Matthias Rella
Affiliations:
Austrian Institute of Technology, Vienna, Austria;Austrian Institute of Technology, Vienna, Austria
Venue:
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Year:
2011

Citing 9
Cited 0

Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A Service for Data-Intensive Computations on Virtual Clusters

INTENSIVE '09 Proceedings of the 2009 First International Conference on Intensive Applications and Services
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Nephele: efficient parallel data processing in the cloud

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

IEEE Transactions on Parallel and Distributed Systems
An Architecture for Distributed High Performance Video Processing in the Cloud

CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Cloud computing paradigms for pleasingly parallel biomedical applications

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cloud computing enables us to create applications that take advantage of large computer infrastructures on demand. Data intensive computing frameworks leverage these technologies in order to generate and process large data sets on clusters of virtualized computers. MapReduce provides an highly scalable programming model in this context that has proven to be widely applicable for processing structured data. In this paper, we present an approach and implementation that utilizes this model for the processing of audiovisual content. The application is capable of analyzing and modifying large audiovisual files using multiple computer nodes in parallel and thereby able to dramatically reduce processing times. The paper discusses the programming model and its application to binary data. Moreover, we summarize key concepts of the implementation and provide a brief evaluation.