Algorithms for matrix transposition on Boolean N-cube configured ensemble architecture
SIAM Journal on Matrix Analysis and Applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Introduction to Parallel Computing
Introduction to Parallel Computing
A Fast Computer Method for Matrix Transposing
IEEE Transactions on Computers
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Survey on Oblivious Routing Strategies
CiE '09 Proceedings of the 5th Conference on Computability in Europe: Mathematical Theory and Computational Practice
Hi-index | 0.00 |
Matrix transpose is a fundamental matrix operation that arises in many scientific and engineering applications. Communication is the main bottleneck in performing matrix transpose on most multiprocessor systems. In this paper, we focus on torus interconnection networks and propose application-level routing techniques that improve load balancing, resulting in better performance. Our basic idea is to route the data via carefully selected intermediate nodes. However, directly employing this technique may lead to worsening of the congestion. We overcome this issue by employing the routing only for selected set of communicating pairs. We implement our optimizations on the Blue Gene/P supercomputer and demonstrate up to 35% improvement in performance.