Warp architecture and implementation
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
URPR—An extension of URCR for software pipelining
MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
A case study in signal processing microprogramming using the URPR software pipelining technique
MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Network-based multicomputers: redefining high performance computing in the 1990s
Proceedings of the decennial Caltech conference on VLSI on Advanced research in VLSI
A Fortran compiler for the FPS-164 scientific computer
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
GURPR*: a new global software pipelining algorithm
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
A VLIW architecture for optimal execution of branch-intensive loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
GPMB—software pipelining branch-intensive loops
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A VLIW architecture based on shifting register files
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Embedded software in real-time signal processing systems: design technologies
Readings in hardware/software co-design
Hi-index | 0.00 |
This paper introduces a VLIW architecture and its optimizing compiler which are now under development. Based on the URPR software pipelining approach, the architecture integrates nine PEs with the same structure on a single-chip. In addition, a pipeline register file is used to reduce the inter-body dependent distance to enhance the overlapping of the adjacent loop iterations, furthermore to shorten the length of the optimized loop body. The pipeline register file also increases the bandwidth between PEs. The optimizing compiler is also based on the URPR software pipelining approach. It uses a two-level software pipelining method to implement phase-coupled resource allocation and code optimization, and obtains good time and space optimal results. A compilation example of an FFT innermost loop is discussed. The simulation results indicate that the architecture could reach high performance with the aid of the optimizing compiler.