Impossibility and universality results for wait-free synchronization
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
A bridging model for parallel computation
Communications of the ACM
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Efficient algorithms for all-to-all communications in multi-port message-passing systems
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Practical parallel algorithms for personalized communication and integer sorting
Journal of Experimental Algorithmics (JEA)
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Graph separators, with applications
Graph separators, with applications
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The HPC Challenge (HPCC) benchmark suite
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable communication protocols for dynamic sparse data exchange
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
AM++: a generalized active message framework
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Active pebbles: a programming model for highly parallel fine-grained data-driven computations
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Writing parallel libraries with MPI - common practice, issues, and extensions
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Adoption protocols for fanout-optimal fault-tolerant termination detection
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Expressing graph algorithms using generalized active messages
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A data-driven approach for executing the CG method on reconfigurable high-performance systems
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Bandwidth-optimal all-to-all exchanges in fat tree networks
Proceedings of the 27th international ACM conference on International conference on supercomputing
Expressing graph algorithms using generalized active messages
Proceedings of the 27th international ACM conference on International conference on supercomputing
Enabling highly-scalable remote memory access programming with MPI-3 one sided
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The scope of scientific computing continues to grow and now includes diverse application areas such as network analysis, combinatorialcomputing, and knowledge discovery, to name just a few. Large problems in these application areas require HPC resources, but they exhibit computation and communication patterns that are irregular, fine-grained, and non-local, making it difficult to apply traditional HPC approaches to achieve scalable solutions. In this paper we present Active Pebbles, a programming and execution model developed explicitly to enable the development of scalable software for these emerging application areas. Our approach relies on five main techniques--scalable addressing, active routing, message coalescing, message reduction, and termination detection--to separate algorithm expression from communication optimization. Using this approach, algorithms can be expressed in their natural forms, with their natural levels of granularity, while optimizations necessary for scalability can be applied automatically to match the characteristics of particular machines. We implement several example kernels using both Active Pebbles and existing programming models, evaluating both programmability and performance. Our experimental results demonstrate that the Active Pebbles model can succinctly and directly express irregular application kernels, while still achieving performance comparable to MPI-based implementations that are significantly more complex.