Active pebbles: parallel programming for data-driven applications

Authors:
Jeremiah James Willcock;Torsten Hoefler;Nicholas Gerard Edmonds;Andrew Lumsdaine
Affiliations:
Indiana University, Bloomington, IN, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 20
Cited 8

Impossibility and universality results for wait-free synchronization

PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
A bridging model for parallel computation

Communications of the ACM
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Efficient algorithms for all-to-all communications in multi-port message-passing systems

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Practical parallel algorithms for personalized communication and integer sorting

Journal of Experimental Algorithmics (JEA)
LogGP: incorporating long messages into the LogP model for parallel computation

Journal of Parallel and Distributed Computing
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Graph separators, with applications

Graph separators, with applications
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Software routing and aggregation of messages to optimize the performance of HPCC randomaccess benchmark

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable communication protocols for dynamic sparse data exchange

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
AM++: a generalized active message framework

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Active pebbles: a programming model for highly parallel fine-grained data-driven computations

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Writing parallel libraries with MPI - common practice, issues, and extensions

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Adoption protocols for fanout-optimal fault-tolerant termination detection

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Expressing graph algorithms using generalized active messages

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A data-driven approach for executing the CG method on reconfigurable high-performance systems

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Bandwidth-optimal all-to-all exchanges in fat tree networks

Proceedings of the 27th international ACM conference on International conference on supercomputing
Expressing graph algorithms using generalized active messages

Proceedings of the 27th international ACM conference on International conference on supercomputing
Enabling highly-scalable remote memory access programming with MPI-3 one sided

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The scope of scientific computing continues to grow and now includes diverse application areas such as network analysis, combinatorialcomputing, and knowledge discovery, to name just a few. Large problems in these application areas require HPC resources, but they exhibit computation and communication patterns that are irregular, fine-grained, and non-local, making it difficult to apply traditional HPC approaches to achieve scalable solutions. In this paper we present Active Pebbles, a programming and execution model developed explicitly to enable the development of scalable software for these emerging application areas. Our approach relies on five main techniques--scalable addressing, active routing, message coalescing, message reduction, and termination detection--to separate algorithm expression from communication optimization. Using this approach, algorithms can be expressed in their natural forms, with their natural levels of granularity, while optimizations necessary for scalability can be applied automatically to match the characteristics of particular machines. We implement several example kernels using both Active Pebbles and existing programming models, evaluating both programmability and performance. Our experimental results demonstrate that the Active Pebbles model can succinctly and directly express irregular application kernels, while still achieving performance comparable to MPI-based implementations that are significantly more complex.