Evaluating the Use of Register Queues in Software Pipelined Loops

Authors:
Gary S. Tyson;Mikhail Smelyanskiy;Edward S. Davidson
Affiliations:
Univ. of Michigan, Ann Arbor;Univ. of Michigan, Ann Arbor;Univ. of Michigan, Ann Arbor
Venue:
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Year:
2001

Citing 18
Cited 8

Highly concurrent scalar processing

Highly concurrent scalar processing
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Evaluation of the WM architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register connection: a new approach to adding registers into instruction set architectures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Look-Ahead Processors

ACM Computing Surveys (CSUR)
Using Sacks to Organize Registers in VLIW Machines

CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Efficient code generation for horizontal architectures: Compiler techniques and architectural support

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

Proceedings of the 3rd conference on Computing frontiers
Register pointer architecture for efficient embedded processors

Proceedings of the conference on Design, automation and test in Europe
Facilitating compiler optimizations through the dynamic mapping of alternate register structures

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Design and implementation of a queue compiler

Microprocessors & Microsystems
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors

CC'07 Proceedings of the 16th international conference on Compiler construction
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

The Journal of Supercomputing
Instruction re-selection for iterative modulo scheduling on high performance multi-issue DSPs

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we examine the effectiveness of a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers. We show that decoupling the architected register space from the physical register space can greatly increase the applicability of software pipelining, even as memory latencies increase. RQs combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules. Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining. We demonstrate the effect of incorporating register queues and software pipelining with 983 loops taken from the Perfect Club, the SPEC suites, and the Livermore Kernels.