Genetic algorithms and instruction scheduling
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Dominator-path scheduling: a global scheduling method
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Avoidance and suppression of compensation code in a trace scheduling compiler
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Optimal Instruction Scheduler for Superscalar Processor
IEEE Transactions on Parallel and Distributed Systems
Code optimization techniques for embedded DSP microprocessors
DAC '95 Proceedings of the 32nd annual ACM/IEEE Design Automation Conference
Efficient instruction scheduling using finite state automata
Proceedings of the 28th annual international symposium on Microarchitecture
Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator
DAC '98 Proceedings of the 35th annual Design Automation Conference
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Compiler-driven cached code compression schemes for embedded ILP processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wavefront scheduling: path based data representation and scheduling of subgraphs
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Optimal instruction scheduling using integer programming
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats
ACM Transactions on Design Automation of Electronic Systems (TODAES)
ILP-based Instruction Scheduling for IA-64
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Performance evaluation for a compressed-VLIW processor
Proceedings of the 2002 ACM symposium on Applied computing
Profile-guided code compression
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Synthesis of Embedded Software from Synchronous Dataflow Specifications
Journal of VLSI Signal Processing Systems
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Vliw instruction scheduling for reduced code size
Vliw instruction scheduling for reduced code size
Prematerialization: reducing register pressure for free
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the conference on Design, automation and test in Europe
Reducing instruction bit-width for low-power VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Code size is an important concern in embedded systems. VLIW architectures are popular for embedded systems, but often increase code size, by requiring NOPs to be inserted into the code to satisfy instruction placement constraints. Existing VLIW instruction schedulers target run-time but not code size. Indeed, current schedulers often increase code size, by generating compensation copies of instructions when moving them across basic block boundaries. Our approach, for the first time, uses the power of scheduling instructions across blocks to reduce code size and not just run-time, for a certain class of VLIWs. We therefore show that trace scheduling, previously synonymous with increased code size, can in fact be used to reduce code size on such VLIWs. Our scheduler uses a cost-model driven, back-tracking approach that starts with an optimal algorithm for searching the solution space in exponential time, but then also employs branch-and-bound techniques and non-optimal heuristics to keep the compile time reasonable (within a factor of 2). Our method reduces the code size for our benchmarks by 17.6% versus the best existing across-block scheduler, while being within 0.8% of its run-time.