Highly concurrent scalar processing
Highly concurrent scalar processing
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register requirements of pipelined processors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register connection: a new approach to adding registers into instruction set architectures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
Using Sacks to Organize Registers in VLIW Machines
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Partitioned Schedules for Clustered VLIW Architectures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration
Proceedings of the 3rd conference on Computing frontiers
Register pointer architecture for efficient embedded processors
Proceedings of the conference on Design, automation and test in Europe
Facilitating compiler optimizations through the dynamic mapping of alternate register structures
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Design and implementation of a queue compiler
Microprocessors & Microsystems
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors
CC'07 Proceedings of the 16th international conference on Compiler construction
The Journal of Supercomputing
Instruction re-selection for iterative modulo scheduling on high performance multi-issue DSPs
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Hi-index | 0.00 |
In this paper, we examine the effectiveness of a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers. We show that decoupling the architected register space from the physical register space can greatly increase the applicability of software pipelining, even as memory latencies increase. RQs combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules. Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining. We demonstrate the effect of incorporating register queues and software pipelining with 983 loops taken from the Perfect Club, the SPEC suites, and the Livermore Kernels.