Parallel Volume Rendering Using Binary-Swap Compositing
IEEE Computer Graphics and Applications
SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
The Feature Tree: Visualizing Feature Tracking in Distributed AMR Datasets
PVG '03 Proceedings of the 2003 IEEE Symposium on Parallel and Large-Data Visualization and Graphics
Architecture of the Component Collective Messaging Interface
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A configurable algorithm for parallel image-compositing applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An Information-Theoretic Framework for Flow Visualization
IEEE Transactions on Visualization and Computer Graphics
A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
An image compositing solution at scale
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Faster topology-aware collective algorithms through non-minimal communication
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
The Parallel Computation of Morse-Smale Complexes
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Dataflow coordination of data-parallel tasks via MPI 3.0
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Large-scale parallel data analysis, where global information from a variety of problem domains is resolved in a distributed memory space, relies on communication. Three communication algorithms motivated by data analysis workloads--merge based reduction, swap based reduction, and neighborhood exchange--are presented, and their performance is benchmarked. These algorithms communicate custom data types among blocks assigned to processes in flexible ways, and their performance is optimized by tunable parameters. Performance is compared with an MPI implementation and with previous communication algorithms on an IBM Blue Gene/P supercomputer at a variety of message sizes and process counts.