Two algorithms for barrier synchronization
International Journal of Parallel Programming
A bridging model for parallel computation
Communications of the ACM
Optimal computation of global sensitive functions in fast networks
Proceedings of the 4th international workshop on Distributed algorithms
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Designing broadcasting algorithms in the Postal Model for message-passing systems
Proceedings of the 4th ACM symposium on Parallel algorithms and architectures
Optimal computation of census functions in the postal model
Discrete Applied Mathematics
Efficient Algorithms for the Reduce-Scatter Operation in LogGP
IEEE Transactions on Parallel and Distributed Systems
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Proceedings of the fourth annual ACM symposium on Principles of distributed computing
Implementing the MPI process topology mechanism
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Evaluation of Current High-Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Lifting sequential graph algorithms for distributed-memory parallel computation
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Overlapping Communication and Computation with High Level Communication Routines
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Leveraging non-blocking collective communication in high-performance applications
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Sparse collective operations for MPI
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
AM++: a generalized active message framework
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transparent neutral element elimination in MPI reduction operations
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Kanor: a declarative language for explicit communication
PADL'11 Proceedings of the 13th international conference on Practical aspects of declarative languages
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
Cosmic microwave background map-making at the petascale and beyond
Proceedings of the international conference on Supercomputing
Breaking the speed and scalability barriers for graph exploration on distributed-memory machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compass: a scalable simulator for an architecture for cognitive computing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adoption protocols for fanout-optimal fault-tolerant termination detection
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Enabling highly-scalable remote memory access programming with MPI-3 one sided
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.01 |
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such programs may involve all (or large numbers) of the participating processes, the actual communication operations are usually sparse in nature. As a result, communication phases are typically expressed explicitly using point-to-point communication operations or collective operations. We define the dynamic sparse data-exchange (DSDE) problem and derive bounds in the well known LogGP model. While current approaches work well with static applications, they run into limitations as modern applications grow in scale, and as the problems that are being solved become increasingly irregular and dynamic. To enable the compact and efficient expression of the communication phase, we develop suitable sparse communication protocols for irregular applications at large scale. We discuss different irregular applications and show the sparsity in the communication for real-world input data. We discuss the time and memory complexity of commonly used protocols for the DSDE problem and develop NBX--a novel fast algorithm with constant memory overhead for solving it. Algorithm NBX improves the runtime of a sparse data-exchange among 8,192 processors on BlueGene/P by a factor of 5.6. In an application study, we show improvements of up to a factor of 28.9 for a parallel breadth first search on 8,192 BlueGene/P processors.