Generic software pipelining at the assembly level

Authors:
Markus Pister;Daniel Kästner
Affiliations:
Saarland University & AbsInt GmbH, Saarbrüücken, Germany;AbsInt GmbH, Saarbrücken, Germany
Venue:
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Year:
2005

Citing 26
Cited 3

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A portable global optimizer and linker

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A dynamic-programming technique for compacting loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
Instruction selection, resource allocation, and scheduling in the AVIV retargetable code generator

DAC '98 Proceedings of the 35th annual Design Automation Conference
EXPRESSION: a language for architecture exploration through compiler/simulator retargetability

DATE '99 Proceedings of the conference on Design, automation and test in Europe
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
The Design and Application of a Retargetable Peephole Optimizer

ACM Transactions on Programming Languages and Systems (TOPLAS)
Generic control flow reconstruction from assembly code

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A comparative study of modulo scheduling techniques

ICS '02 Proceedings of the 16th international conference on Supercomputing
Retargetable Code Generation for Digital Signal Processors

Retargetable Code Generation for Digital Signal Processors
Code Generation for Embedded Processors

Code Generation for Embedded Processors
Principles of Program Analysis

Principles of Program Analysis
A Retargetable C Compiler: Design and Implementation

A Retargetable C Compiler: Design and Implementation
Sifting out the mud: low level C++ code reuse

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Perfect Pipelining: A New Loop Parallelization Technique

ESOP '88 Proceedings of the 2nd European Symposium on Programming
Post-pass compaction techniques

Communications of the ACM - Program compaction
Register-Sensitive Software Pipelining

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Local code generation and compaction in optimizing microcode compilers

Local code generation and compaction in optimizing microcode compilers
Code optimization libraries for retargetable compilation for embedded digital signal processors

Code optimization libraries for retargetable compilation for embedded digital signal processors
TDL: a hardware description language for retargetable postpass optimizations and analyses

Proceedings of the 2nd international conference on Generative programming and component engineering
Link-time optimization of ARM binaries

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop

Timing optimization via nest-loop pipelining considering code size

Microprocessors & Microsystems
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software used in embedded systems is subject to strict timing and space constraints. The growing software complexity creates an urgent need for fast program execution under the constraint of very limited code size. However, even modern compilers produce code whose quality often is far away from the optimum. The PROPAN system is a postpass optimization framework that enables high-quality machine-dependent postpass optimizers to be generated from a concise hardware specification. The postpass approach allows to enhance the code quality of existing compilers and offers a smooth integration into existing development tool chains. In this article we present an adaptation of the modulo scheduling software pipelining algorithm to the postpass level. The implementation is fully retargetable and has been incorporated in the PROPAN system. The differences of postpass modulo scheduling compared to the standard version of the algorithm are outlined. Experimental results conducted on the Philips TriMedia TM1000 processor demonstrate that modulo scheduling can be applied at the postpass level and allows to achieve a significant code speedup with moderate code size increase.