Compiling and optimizing for decoupled architectures

Authors:
Nigel Topham;Alasdair Rawsthorne;Callum McLean;Muriel Mewissen;Peter Bird
Affiliations:
Department of Computer Science, The University of Edinburgh, Mayfield Road, Edinburgh, UK;Department of Computer Science, The University of Manchester, Oxford Road, Manchester, UK;Department of Computer Science, The University of Edinburgh, Mayfield Road, Edinburgh, UK;Department of Computer Science, The University of Edinburgh, Mayfield Road, Edinburgh, UK;Department of Computer Science, The University of Michigan
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 11
Cited 5

The ZS-1 central processor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Alpha architecture reference manual

Alpha architecture reference manual
Evaluation of the WM architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cray Y-MP C90: system features and early benchmark results

Parallel Computing
The effectiveness of decoupling

ICS '93 Proceedings of the 7th international conference on Supercomputing
Designing the TFP Microprocessor

IEEE Micro
Supercomputer performance evaluation and the Perfect Benchmarks

ICS '90 Proceedings of the 4th international conference on Supercomputing
PIPE: a VLSI decoupled architecture

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Performance of the decoupled ACRI-1 architecture: the perfect club

HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Sigma II: A Tool Kit for Building Parallelizing Compilers and Performance Analysis Systems

Proceedings of the IFIP WG 10.3 Workshop on Programming Environments for Parallel Computing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming

Improving Latency Tolerance of Multithreading through Decoupling

IEEE Transactions on Computers
Multithreading decoupled architectures for complexity-effective general purpose computing

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Code Partitioning in Decoupled Compilers

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
OUTRIDER: efficient memory latency tolerance with decoupled strands

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decoupled architectures provide a key to the problem of sustained supercomputer performance through their ability to hide large memory latencies. When a program executes in a decoupled mode the perceived memory latency at the processor is zero; effectively the entire physical memory has an access time equivalent to the processor's register file, and latency is completely hidden. However, the asynchronous functional units within a decoupled architecture must occasionally synchronize, incurring a high penalty. The goal of compiling and optimizing for decoupled architectures is to partition the program between the asynchronous functional units in such a way that latencies are hidden but synchronization events are executed infrequently. This paper describes a model for decoupled compilation, and explains the effectiveness of compilation for decoupled systems. A number of new compiler optimizations are introduced and evaluated quantitatively using the Perfect Club scientific benchmarks. We show that with a suitable repertiore of optimizations, it is possible to hide large latencies most of the time for most of the programs in the Perfect Club.