A simple parallel algorithm for the maximal independent set problem
SIAM Journal on Computing
Efficient management of parallelism in object-oriented numerical software libraries
Modern software tools for scientific computing
Delta-Stepping: A Parallel Single Source Shortest Path Algorithm
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Graph Twiddling in a MapReduce World
Computing in Science and Engineering
Towards Efficient MapReduce Using MPI
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
Improving performance of adaptive component-based dataflow middleware
Parallel Computing
MapIterativeReduce: a framework for reduction-intensive data processing on azure clouds
Proceedings of third international workshop on MapReduce and its Applications Date
MRKDSBC: a distributed background modeling algorithm based on mapreduce
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Inexact subgraph isomorphism in MapReduce
Journal of Parallel and Distributed Computing
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
DISRAY: A distributed ray tracing by map-reduce
Computers & Geosciences
Lazy tree mapping: generalizing and scaling deterministic parallelism
Proceedings of the 4th Asia-Pacific Workshop on Systems
Distributed media indexing based on MPI and MapReduce
Multimedia Tools and Applications
Hi-index | 0.00 |
We describe a parallel library written with message-passing (MPI) calls that allows algorithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides ''map'' and ''reduce'' functions that operate independently on elements of a data set distributed across processors. The library performs needed data movement between processors. We describe how typical MapReduce functionality can be implemented in an MPI context, and also in an out-of-core manner for data sets that do not fit within the aggregate memory of a parallel machine. Our motivation for creating this library was to enable graph algorithms to be written as MapReduce operations, allowing processing of terabyte-scale data sets on traditional MPI-based clusters. We outline MapReduce versions of several such algorithms: vertex ranking via PageRank, triangle finding, connected component identification, Luby's algorithm for maximally independent sets, and single-source shortest-path calculation. To test the algorithms on arbitrarily large artificial graphs we generate randomized R-MAT matrices in parallel; a MapReduce version of this operation is also described. Performance and scalability results for the various algorithms are presented for varying size graphs on a distributed-memory cluster. For some cases, we compare the results with non-MapReduce algorithms, different machines, and different MapReduce software, namely Hadoop. Our open-source library is written in C++, is callable from C++, C, Fortran, or scripting languages such as Python, and can run on any parallel platform that supports MPI.