LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The communication challenge for MPP: Intel Paragon and Meiko CS-2
Parallel Computing
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
Assessing Fast Network Interfaces
IEEE Micro
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
LogGP: Incorporating Long Messages into the LogP Model --- One step closer towards a realistic model for parallel computation
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
Data Transfers between Processes in an SMP System: Performance Study and Application to MPI
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Performance analysis of MPI collective operations
Cluster Computing
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
Total exchange performance modelling under network contention
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
mPlogP: A Parallel Computation Model for Heterogeneous Multi-core Computer
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Hi-index | 0.00 |
This paper presents and validates performance models for a varietyvof high-performance collective communication algorithms for systems with Cell processors. The systems modeled include a single Cell processor, two Cell chips on a Cell Blade, and a cluster of Cell Blades. The models extend PLogP, the well-known point-topoint performance model, by accounting for the unique hardware characteristics of the Cell (e.g., heterogeneous interconnects and DMA engines) and by applying the model to collective communication. This paper also presents a micro-benchmark suite to accurately measure the extended PLogP parameters on the Cell Blade and then uses these parameters to model different algorithms for the barrier, broadcast, reduce, all-reduce, and all-gather collective operations. Out of 425 total performance predictions, 398 of them see less than 10% error compared to the actual execution time and all of them see less than 15%.