Modulo schedule buffers

Authors:
Matthew C. Merten;Wen-mei W. Hwu
Affiliations:
University of Illinois, Urbana, IL;University of Illinois, Urbana, IL
Venue:
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Year:
2001

Citing 13
Cited 4

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Efficient scheduling of fine grain parallelism in loops

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for control-intensive general-purpose programs

Modulo scheduling for control-intensive general-purpose programs
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
Effective exploitation of a zero overhead loop buffer

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Efficient Exception Handling Techniques for High-Performance Processor Architectures

Efficient Exception Handling Techniques for High-Performance Processor Architectures

Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Compilation strategies for reducing code size on a VLIW processor with variable length instructions

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop control is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally; hardware control limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.