Adapting scientific computing problems to clouds using MapReduce

Authors:
Satish Narayana Srirama;Pelle Jakovits;Eero Vainikko
Affiliations:
-;-;-
Venue:
Future Generation Computer Systems
Year:
2012

Citing 18
Cited 9

Scalability, portability and predictability: the BSP approach to parallel programming

Future Generation Computer Systems - Special issue: parallel computing applications
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Monte Carlo Statistical Methods (Springer Texts in Statistics)

Monte Carlo Statistical Methods (Springer Texts in Statistics)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Monte Carlo methods for matrix computations on the grid

Future Generation Computer Systems
Mobile web services mediation framework

Proceedings of the 2nd workshop on Middleware for service oriented computing: held at the ACM/IFIP/USENIX International Middleware Conference
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
Graph Twiddling in a MapReduce World

Computing in Science and Engineering
Mobile hosts in enterprise service integration

International Journal of Web Engineering and Technology
Introduction to web services architecture

IBM Systems Journal
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Scalable Mobile Web Services Mediation Framework

ICIW '10 Proceedings of the 2010 Fifth International Conference on Internet and Web Applications and Services
SciCloud: Scientific Computing on the Cloud

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Understanding application-level interoperability: Scaling-out MapReduce over high-performance grids and clouds

Future Generation Computer Systems

Scheduling mapreduce jobs in HPC clusters

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis

Future Generation Computer Systems
CCBKE - Session key negotiation for fast and secure scheduling of scientific applications in cloud computing

Future Generation Computer Systems
Performance evaluation of parallel strategies in public clouds: A study with phylogenomic workflows

Future Generation Computer Systems
Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing

Future Generation Computer Systems
Clustering on the cloud: reducing CLARA to MapReduce

Proceedings of the Second Nordic Symposium on Cloud Computing & Internet Technologies
Rapid processing of remote sensing images based on cloud computing

Future Generation Computer Systems
A MapReduce-based indoor visual localization system using affine invariant features

Computers and Electrical Engineering
NEWT - A Fault Tolerant BSP Framework on Hadoop YARN

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study this, we established a scientific computing cloud (SciCloud) project and environment on our internal clusters. The main goal of the project is to study the scope of establishing private clouds at the universities. With these clouds, students and researchers can efficiently use the already existing resources of university computer networks, in solving computationally intensive scientific, mathematical, and academic problems. However, to be able to run the scientific computing applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. This paper summarizes the challenges associated with reducing iterative algorithms to the MapReduce model. Algorithms used by scientific computing are divided into different classes by how they can be adapted to the MapReduce model; examples from each such class are reduced to the MapReduce model and their performance is measured and analyzed. The study mainly focuses on the Hadoop MapReduce framework but also compares it to an alternative MapReduce framework called Twister, which is specifically designed for iterative algorithms. The analysis shows that Hadoop MapReduce has significant trouble with iterative problems while it suits well for embarrassingly parallel problems, and that Twister can handle iterative problems much more efficiently. This work shows how to adapt algorithms from each class into the MapReduce model, what affects the efficiency and scalability of algorithms in each class and allows us to judge which framework is more efficient for each of them, by mapping the advantages and disadvantages of the two frameworks. This study is of significant importance for scientific computing as it often uses complex iterative methods to solve critical problems and adapting such methods to cloud computing frameworks is not a trivial task.