An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
The warp computer: Architecture, implementation, and performance
IEEE Transactions on Computers
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Software Pipelining of Nested Loops
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Partitioned Schedules for Clustered VLIW Architectures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Variable-Based Multi-module Data Caches for Clustered VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Universities and the future of high-performance computing technology
AFIPS '83 Proceedings of the May 16-19, 1983, national computer conference
Integrated Modulo Scheduling for Clustered VLIW Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
The IBM System/370 vector architecture
IBM Systems Journal
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 4.11 |
Architecture, not hardware, gives the FPS array processor family its computational speed. Here is an inside look at the trade-offs and ideas that went into its design.