Reducing code size in VLIW instruction scheduling

Authors:
Steve Haga;Andrew Webber;Yi Zhang;Nghi Nguyen;Rajeev Barua
Affiliations:
Department of Electrical & Computer Engineering University of Maryland, College Park, MD 20742, USA (Corresponding author. E-mail: stevhaga@eng.umd.edu);Department of Electrical & Computer Engineering University of Maryland, College Park, MD 20742, USA;Department of Electrical & Computer Engineering University of Maryland, College Park, MD 20742, USA;Department of Electrical & Computer Engineering University of Maryland, College Park, MD 20742, USA;Department of Electrical & Computer Engineering University of Maryland, College Park, MD 20742, USA
Venue:
Journal of Embedded Computing - Low-power Embedded Systems
Year:
2005

Citing 26
Cited 3

The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Genetic algorithms and instruction scheduling

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Dominator-path scheduling: a global scheduling method

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
An Optimal Instruction Scheduler for Superscalar Processor

IEEE Transactions on Parallel and Distributed Systems
Code optimization techniques for embedded DSP microprocessors

DAC '95 Proceedings of the 32nd annual ACM/IEEE Design Automation Conference
Efficient instruction scheduling using finite state automata

Proceedings of the 28th annual international symposium on Microarchitecture
Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator

DAC '98 Proceedings of the 35th annual Design Automation Conference
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
Advanced compiler design and implementation

Advanced compiler design and implementation
Compiler-driven cached code compression schemes for embedded ILP processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wavefront scheduling: path based data representation and scheduling of subgraphs

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Optimal instruction scheduling using integer programming

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats

ACM Transactions on Design Automation of Electronic Systems (TODAES)
ILP-based Instruction Scheduling for IA-64

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Performance evaluation for a compressed-VLIW processor

Proceedings of the 2002 ACM symposium on Applied computing
Profile-guided code compression

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A near-optimal instruction scheduler for a tightly constrained, variable instruction set embedded processor

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Synthesis of Embedded Software from Synchronous Dataflow Specifications

Journal of VLSI Signal Processing Systems
Introducing the FR500 Embedded Microprocessor

IEEE Micro
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Vliw instruction scheduling for reduced code size

Vliw instruction scheduling for reduced code size

Prematerialization: reducing register pressure for free

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Harnessing horizontal parallelism and vertical instruction packing of programs to improve system overall efficiency

Proceedings of the conference on Design, automation and test in Europe
Reducing instruction bit-width for low-power VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Code size is an important concern in embedded systems. VLIW architectures are popular for embedded systems, but often increase code size, by requiring NOPs to be inserted into the code to satisfy instruction placement constraints. Existing VLIW instruction schedulers target run-time but not code size. Indeed, current schedulers often increase code size, by generating compensation copies of instructions when moving them across basic block boundaries. Our approach, for the first time, uses the power of scheduling instructions across blocks to reduce code size and not just run-time, for a certain class of VLIWs. We therefore show that trace scheduling, previously synonymous with increased code size, can in fact be used to reduce code size on such VLIWs. Our scheduler uses a cost-model driven, back-tracking approach that starts with an optimal algorithm for searching the solution space in exponential time, but then also employs branch-and-bound techniques and non-optimal heuristics to keep the compile time reasonable (within a factor of 2). Our method reduces the code size for our benchmarks by 17.6% versus the best existing across-block scheduler, while being within 0.8% of its run-time.