Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Heuristics for register-constrained software pipelining
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Cache sensitive modulo scheduling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
The TigerSHARC DSP Architecture
IEEE Micro
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Partitioned Schedules for Clustered VLIW Architectures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Register-Sensitive Software Pipelining
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Optimizing Loop Performance for Clustered VLIW Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Cluster assignment of global values for clustered VLIW processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Removing communications in clustered microarchitectures through instruction replication
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Register aware scheduling for distributed cache clustered architecture
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Register saturation in instruction level parallelism
International Journal of Parallel Programming
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Compiler-assisted leakage energy optimization for clustered VLIW architectures
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Virtual Cluster Scheduling Through the Scheduling Graph
Proceedings of the International Symposium on Code Generation and Optimization
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Communication optimizations for global multi-threaded instruction scheduling
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Load scheduling: reducing pressure on distributed register files for free
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
High-performance and low-power VLIW cores for numerical computations
International Journal of High Performance Computing and Networking
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
An efficient heuristic for instruction scheduling on clustered vliw processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
International Journal of Organizational and Collective Intelligence
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Hi-index | 0.00 |
Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around.In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.