Compiling and optimizing for decoupled architectures
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A comparison of data prefetching on an access decoupled and superscalar machine
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance modeling and code partitioning for the DS architecture
Proceedings of the 25th annual international symposium on Computer architecture
Performance of the decoupled ACRI-1 architecture: the perfect club
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Compiler techniques for evaluating and extending decoupled architectures (data prefetching)
Compiler techniques for evaluating and extending decoupled architectures (data prefetching)
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Decoupled access/execute architectures seek to maximize performance by dividing a given program into two separate instruction streams and executing the streams on independent cooperating processors. The instruction streams consist of those instructions involved in generating memory accesses (the Access stream) and those that consume the data (the Execute stream). If the processor running the access stream is able to get ahead of the execute stream, then dynamic pre-loading of operands will occur and the penalty due to long latency operations (such as memory accesses) will be reduced or eliminated. Although these architectures have been around for many years, the performance analyses performed have been incomplete for want of a compiler. Very little has been published on how to construct a compiler for such an architecture. In this paper we describe the partitioning method employed in Daecomp, a compiler for decoupled access/execute processors.