HPCC RandomAccess benchmark for next generation supercomputers

Authors:
Vikas Aggarwal;Yogish Sabharwal;Rahul Garg;Philip Heidelberger
Affiliations:
IBM India Research Lab, Plot 4, Block C, Vasant Kunj Inst. Area, New Delhi 110070, India;IBM India Research Lab, Plot 4, Block C, Vasant Kunj Inst. Area, New Delhi 110070, India;IBM India Research Lab, Plot 4, Block C, Vasant Kunj Inst. Area, New Delhi 110070, India;IBM T. J. Watson Research Center, 1101 Kitchawan Rd, Rt. 134, Yorktown Heights, NY 10598, USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 3

Visualization of simulation results for the PERCS Hub chip performance verification

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
Congestion avoidance on manycore high performance computing systems

Proceedings of the 26th ACM international conference on Supercomputing
Looking under the hood of the IBM blue gene/Q network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we examine the key elements determining the performance of the HPC Challenge RandomAccess benchmark on next generation supercomputers. We find that the performance of this benchmark is closely related to the bisection bandwidth of the underlying communication network, performance of integer divide operation and details of benchmark specifications such as error tolerance and permissible multi-core mapping strategies. We demonstrate that seemingly small and innocuous changes in the benchmark can lead to significantly different system performance. We also present an algorithm to optimize RandomAccess benchmark for multi-core systems. Our algorithm uses aggregation and software routing and balances the load on the cores by specializing each of the cores for one specific routing or update function. This algorithm gives approximately a factor of 3 speedup on the Blue Gene/P system which is based on quad-core nodes.