MapReduce for Data Intensive Scientific Analyses

Authors:
Jaliya Ekanayake;Shrideep Pallickara;Geoffrey Fox
Affiliations:
-;-;-
Venue:
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Year:
2008

Citing 0
Cited 42

Distributed Response Time Analysis of GSPN Models with MapReduce

Simulation
Evaluating SPLASH-2 Applications Using MapReduce

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Cloud technologies for bioinformatics applications

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Max-cover in map-reduce

Proceedings of the 19th international conference on World wide web
SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
A Map-Reduce System with an Alternate API for Multi-core Environments

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapCG: writing parallel program portable between CPU and GPU

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Scripting the cloud with skywriting

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Efficient pipelined architecture for competitive learning

Journal of Parallel and Distributed Computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Behavioral simulations in MapReduce

Proceedings of the VLDB Endowment
Attribute reduction for massive data based on rough set theory and MapReduce

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Garbage collection auto-tuning for Java mapreduce on multi-cores

Proceedings of the international symposium on Memory management
Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Proceedings of the 20th international symposium on High performance distributed computing
A distributed look-up architecture for text mining applications using MapReduce

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
An approach for processing large and non-uniform media objects on mapreduce-based clusters

ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
Benchmarking MapReduce Implementations for Application Usage Scenarios

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Evaluating the suitability of mapreduce for surface temperature analysis codes

Proceedings of the second international workshop on Data intensive computing in the clouds
Parallel data processing with MapReduce: a survey

ACM SIGMOD Record
A fully-protected large-scale email system built on map-reduce framework

GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
DVM: towards a datacenter-scale virtual machine

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
A parallel method for computing rough set approximations

Information Sciences: an International Journal
iMapReduce: A Distributed Computing Framework for Iterative Computation

Journal of Grid Computing
A service-oriented taxonomical spectrum, cloudy challenges and opportunities of cloud computing

International Journal of Communication Systems
MapIterativeReduce: a framework for reduction-intensive data processing on azure clouds

Proceedings of third international workshop on MapReduce and its Applications Date
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MapReduce approach to collective classification for networks

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Cloud-based image processing system with priority-based data distribution mechanism

Computer Communications
Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies

Future Generation Computer Systems
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling

ACM Transactions on Architecture and Code Optimization (TACO)
HyMR: a hybrid MapReduce workflow system

Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Performance comparison under failures of MPI and MapReduce: An analytical approach

Future Generation Computer Systems
Parallelizing the execution of sequential scripts

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SIDR: structure-aware intelligent data routing in Hadoop

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Achieving Accountable MapReduce in cloud computing

Future Generation Computer Systems
Parallel skyline queries over uncertain data streams in cloud computing environments

International Journal of Web and Grid Services
A MapReduce task scheduling algorithm for deadline constraints

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most scientific data analyses comprise analyzing voluminous data collected from various instruments. Efficient parallel/concurrent algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. The recently introduced MapReduce technique has gained a lot of attention from the scientific community for its applicability in large parallel data analyses. Although there are many evaluations of the MapReduce technique using large textual data collections, there have been only a few evaluations for scientific data analyses. The goals of this paper are twofold. First, we present our experience in applying the MapReduce technique for two scientific data analyses: (i) High Energy Physics data analyses; (ii) Kmeans clustering. Second, we present CGL-MapReduce, a streaming-based MapReduce implementation and compare its performance with Hadoop.