Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Cut problems and their application to divide-and-conquer
Approximation algorithms for NP-hard problems
MPI: The Complete Reference
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A complexity theory for VLSI
Expander flows, geometric embeddings and graph partitioning
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Blue Gene/L compute chip: memory and Ethernet subsystem
IBM Journal of Research and Development
Blue Gene/L torus interconnection network
IBM Journal of Research and Development
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Efficient, portable implementation of asynchronous multi-place programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
Exploiting communication and packaging locality for cost-effective large scale networks
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
The HPC Challenge(HPCC) benchmark suite is increasingly being used to evaluate the performance of supercomputers. It augments the traditional LINPACK benchmark by adding six more benchmarks, each designed to measure a specific aspect of the system performance.In this paper, we analyze the HPCC Randomaccess benchmark which is designed to measure the performance of random memory updates. We show that, on many systems, the bisection bandwidth of the network may be the performance bottleneck of this benchmark. We suggest an aggregation and software routing based technique that may be used to optimize this benchmark. We report the performance results obtained using this technique on the Blue Gene/L supercomputer.