The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
The MorphoSys Parallel Reconfigurable System
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Operation and data mapping for CGRAs with multi-bank memory
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors
Journal of Signal Processing Systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Predictable multithread scheduling with cycle-accurate thread progress monitor
Proceedings of the 2011 ACM Symposium on Applied Computing
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
International Journal of Organizational and Collective Intelligence
Application task and data placement in embedded many-core NUMA architectures
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
In high-end embedded systems, coarse-grained reconfigurable architectures (CGRA) continue to replace traditional ASIC designs. CGRAs offer high performance at a low power consumption, yet provide flexibility through programmability. In this paper we introduce a recurrence cycle-aware scheduling technique for CGRAs. Our modulo scheduler groups operations belonging to a recurrence cycle into a clustered node and then computes a scheduling order for those clustered nodes. Deadlocks that arise when two or more recurrence cycles depend on each other are resolved by using heuristics that favor recurrence cycles with long recurrence delays. While with previous work one had to sacrifice either a fast compilation speed in order to get good quality results, or vice versa, this is not necessary anymore with the proposed recurrence cycle-aware scheduling technique. We have implemented the proposed method into our in-house CGRA chip and compiler solution and show that the technique achieves better quality schedules than schedulers based on simulated annealing at a 170-fold speed increase.