Cache sensitive modulo scheduling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Loop Shifting for Loop Compaction
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
RS-FDRA: a register sensitive software pipelining algorithm for embedded VLIW processors
Proceedings of the ninth international symposium on Hardware/software codesign
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
A comparative study of modulo scheduling techniques
ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor
ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory
International Journal of Parallel Programming
Loop Shifting for Loop Compaction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Split-Path Enhanced Pipeline Scheduling
IEEE Transactions on Parallel and Distributed Systems
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Instruction Replication for Clustered Microarchitectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Removing communications in clustered microarchitectures through instruction replication
ACM Transactions on Architecture and Code Optimization (TACO)
Demystifying on-the-fly spill code
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Contributions to the GNU compiler collection
IBM Systems Journal
Variable-Based Multi-module Data Caches for Clustered VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
A spill code minimization technique: application in the metrowerks starcore C compiler
International Journal of Parallel Programming
In search of a program generator to implement generic transformations for high-performance computing
Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Heterogeneous Clustered VLIW Microarchitectures
Proceedings of the International Symposium on Code Generation and Optimization
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Languages and Compilers for Parallel Computing
Integrated Modulo Scheduling for Clustered VLIW Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A simple, verified validator for software pipelining
Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Static speculation as post-link optimization for the Grid Alu processor
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
An energy-efficient patchable accelerator for post-silicon engineering changes
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A constraint based approach to cyclic RCPSP
CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
Massively parallel programming models used as hardware description languages: the OpenCL case
Proceedings of the International Conference on Computer-Aided Design
Multi-dimensional kernel generation for loop nest software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Global cyclic cumulative constraint
CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
International Journal of Organizational and Collective Intelligence
A pattern-supported parallelization approach
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
The benefits of using variable-length pipelined operations in high-level synthesis
ACM Transactions on Embedded Computing Systems (TECS)
CROSS cyclic resource-constrained scheduling solver
Artificial Intelligence
Hi-index | 0.01 |
This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements and stage count. Swing Modulo Scheduling is an heuristic approach that has a low computational cost. The paper describes the technique and evaluates it for the Perfect Club benchmark suite. SMS is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. SMS is also compared with an integer linear programming approach that generates optimum schedules but with a huge computational cost, which makes it feasible only for very small loops. For a set of small loops, SMS obtained the optimum initiation interval in all the cases and its schedules required only 5% more registers and a 1% higher stage count than the optimum.