Enhancing loop buffering of media and telecommunications applications using low-overhead predication

Authors:
John W. Sias;Hillery C. Hunter;Wen-mei W. Hwu
Affiliations:
University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign;University of Illinois, Urbana-Champaign
Venue:
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Year:
2001

Citing 13
Cited 13

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
Effective exploitation of a zero overhead loop buffer

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Exploiting data forwarding to reduce the power budget of VLIW embedded processors

Proceedings of the conference on Design, automation and test in Europe
An Architecture Framework for Introducing Predicated Execution into Embedded Microprocessors

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing

Energy aware compilation for DSPs with SIMD instructions

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Code coverage and input variability: effects on architecture and compiler research

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Tiny instruction caches for low power embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

IEEE Transactions on Computers
Dataflow analysis for energy-efficient scratch-pad memory management

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Improving scratch-pad memory reliability through compiler-guided data block duplication

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Distributed loop controller architecture for multi-threading in uni-threaded VLIW processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Dynamic scratch-pad memory management for irregular array access patterns

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors

CC'07 Proceedings of the 16th international conference on Compiler construction
Compiler-guided leakage optimization for banked scratch-pad memories

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
RIMP: runtime implicit predication

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Low power engineering

Embedded Systems Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Media- and telecommunications-focused processors, increasingly designed as deeply pipelined, statically-scheduled VLIWs, rely on loop buffers for low-overhead execution of simple loops. Key loops containing control flow pose a substantial problem---full predication has a high encoding overhead, and partial predication techniques do not support if-conversion, the transformation of general acyclic control flow into predicated blocks. Using a set of significant media processing benchmarks, drawn from MediaBench and contemporary telecommunications standards, we explore a compromise approach. We demonstrate a compiler using if-conversion and specialized loop transformations to arrange for 70-99% of fetched operations to come from a simple, statically managed 256-instruction loop buffer, saving instruction fetch power and eliminating branch penalties. To complement this we introduce a "niche" form of predication specialized to permit general if-conversion with only a single bit in the encoding of each operation and to eliminate much of the hardware overhead of a predicate register-based approach.