Software routing and aggregation of messages to optimize the performance of HPCC randomaccess benchmark

Authors:
Rahul Garg;Yogish Sabharwal
Affiliations:
IBM India Research Lab, New Delhi, India;IBM India Research Lab, New Delhi, India
Venue:
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Year:
2006

Citing 12
Cited 5

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Cut problems and their application to divide-and-conquer

Approximation algorithms for NP-hard problems
MPI: The Complete Reference

MPI: The Complete Reference
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A complexity theory for VLSI

A complexity theory for VLSI
Expander flows, geometric embeddings and graph partitioning

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimization of MPI collective communication on BlueGene/L systems

Proceedings of the 19th annual international conference on Supercomputing
Overview of the Blue Gene/L system architecture

IBM Journal of Research and Development
Blue Gene/L compute chip: memory and Ethernet subsystem

IBM Journal of Research and Development
Blue Gene/L torus interconnection network

IBM Journal of Research and Development

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Efficient, portable implementation of asynchronous multi-place programs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Active pebbles: parallel programming for data-driven applications

Proceedings of the international conference on Supercomputing
Exploiting communication and packaging locality for cost-effective large scale networks

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The HPC Challenge(HPCC) benchmark suite is increasingly being used to evaluate the performance of supercomputers. It augments the traditional LINPACK benchmark by adding six more benchmarks, each designed to measure a specific aspect of the system performance.In this paper, we analyze the HPCC Randomaccess benchmark which is designed to measure the performance of random memory updates. We show that, on many systems, the bisection bandwidth of the network may be the performance bottleneck of this benchmark. We suggest an aggregation and software routing based technique that may be used to optimize this benchmark. We report the performance results obtained using this technique on the Blue Gene/L supercomputer.