Integer and combinatorial optimization
Integer and combinatorial optimization
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Coloring heuristics for register allocation
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Register allocation via graph coloring
Register allocation via graph coloring
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
Designing the TFP Microprocessor
IEEE Micro
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Scheduling and mapping: software pipelining in the presence of structural hazards
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimum modulo schedules for minimum register requirements
ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal software pipelining with function unit and register constraints
Optimal software pipelining with function unit and register constraints
A Fortran compiler for the FPS-164 scientific computer
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Systolic Array Optimizing Compiler
A Systolic Array Optimizing Compiler
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Loop Storage Optimization for Dataflow Machines
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Fine-Grain Scheduling under Resource Constraints
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Efficient Algorithms for Cyclic Scheduling
Efficient Algorithms for Cyclic Scheduling
Code reuse in an optimizing compiler
Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient formulation for optimal modulo schedulers
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
RECOD: a retiming heuristic to optimize resource and memory utilization in HW/SW codesigns
Proceedings of the 6th international workshop on Hardware/software codesign
Optimal Modulo Scheduling Through Enumeration
International Journal of Parallel Programming
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Using profiling to reduce branch misprediction costs on a dynamically scheduled processor
Proceedings of the 14th international conference on Supercomputing
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
Compiling with code-size constraints
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
On achieving balanced power consumption in software pipelined loops
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Combining Loop Transformations Considering Caches and Scheduling
International Journal of Parallel Programming
PROPAN: A Retargetable System for Postpass Optimisations and Analyses
LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
Software Pipelining of Nested Loops
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Speculative Prefetching of Induction Pointers
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Selective Guarded Execution Using Profiling on a Dynamically Scheduled Processor
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Efficient spill code for SDRAM
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Compiling with code-size constraints
ACM Transactions on Embedded Computing Systems (TECS)
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Real-Time Imaging - Special issue on software engineering
Software pipelining: an effective scheduling technique for VLIW machines
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Demystifying on-the-fly spill code
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
A global progressive register allocator
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Allocating architected registers through differential encoding
ACM Transactions on Programming Languages and Systems (TOPLAS)
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
ACM Transactions on Embedded Computing Systems (TECS)
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Allocating rotating registers by scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Just-In-Time Software Pipelining
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
This paper is a scientific comparison of two code generation techniques with identical goals --- generation of the best possible software pipelined code for computers with instruction level parallelism. Both are variants of modulo scheduling, a framework for generation of software pipelines pioneered by Rau and Glaser [RaG181], but are otherwise quite dissimilar.One technique was developed at Silicon Graphics and is used in the MIPSpro compiler. This is the production compiler for SGI's systems which are based on the MIPS R8000 processor [Hsu94]. It is essentially a branch--and--bound enumeration of possible schedules with extensive pruning. This method is heuristic because of the way it prunes and also because of the interaction between register allocation and scheduling.The second technique aims to produce optimal results by formulating the scheduling and register allocation problem as an integrated integer linear programming (ILP1) problem. This idea has received much recent exposure in the literature [AlGoGa95, Feautrier94, GoAlGa94a, GoAlGa94b, Eichenberger95], but to our knowledge all previous implementations have been too preliminary for detailed measurement and evaluation. In particular, we believe this to be the first published measurement of runtime performance for ILP based generation of software pipelines.A particularly valuable result of this study was evaluation of the heuristic pipelining technology in the SGI compiler. One of the motivations behind the McGill research was the hope that optimal software pipelining, while not in itself practical for use in production compilers, would be useful for their evaluation and validation. Our comparison has indeed provided a quantitative validation of the SGI compiler's pipeliner, leading us to increased confidence in both techniques.