Compiling and optimizing for decoupled architectures

  • Authors:
  • Nigel Topham;Alasdair Rawsthorne;Callum McLean;Muriel Mewissen;Peter Bird

  • Affiliations:
  • Department of Computer Science, The University of Edinburgh, Mayfield Road, Edinburgh, UK;Department of Computer Science, The University of Manchester, Oxford Road, Manchester, UK;Department of Computer Science, The University of Edinburgh, Mayfield Road, Edinburgh, UK;Department of Computer Science, The University of Edinburgh, Mayfield Road, Edinburgh, UK;Department of Computer Science, The University of Michigan

  • Venue:
  • Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processor's register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.