Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
The 48-core SCC Processor: the Programmer's View
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Effective Performance Measurement at Petascale Using IPM
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
"Single-chip cloud computer", an IA tera-scale research processor
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
RCKMPI - lightweight MPI implementation for intel's single-chip cloud computer (SCC)
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Performance tuning of SCC-MPICH by means of the proposed MPI-3.0 tool interface
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Invasive MPI on intel's single-chip cloud computer
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
X10 on the single-chip cloud computer: porting and preliminary performance
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Hi-index | 0.00 |
The number of cores in future CPUs is expected to increase steadily. Balanced CPU designs scale hardware cache coherency functionality according to the number of cores, in order to minimize bottlenecks in parallel applications. An alternative approach is to do away with hardware coherence entirely; the Single-chip Cloud Computer (SCC), a 48 core experimental processor from Intel labs, does exactly that. A wait-free protocol for message passing on non-coherent buffers was introduced with the RCKMPI library, in order to support MPI on the SCC. In this work, the message passing performance of the protocol is modeled. Additionally, a port for symmetric multi-processors is introduced and used for comparison with MPICH2-Nemesis and Open MPI. Performance is analyzed based on statistics collected on a 4-dimensional space composed of source rank, target rank, message size and frequency.