Theory of linear and integer programming
Theory of linear and integer programming
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Regular partitioning for synthesizing fixed-size systolic arrays
Integration, the VLSI Journal
A synthesis method of LSGP partitioning for given-shape regular arrays
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
High-Level Synthesis of Nonprogrammable Hardware Accelerators
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Lattice-based memory allocation
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Lattice-Based Memory Allocation
IEEE Transactions on Computers
Complexity of matrix product on modular linear systolic arrays for algorithms with affine schedules
Journal of Parallel and Distributed Computing
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient control generation for mapping nested loop programs onto processor arrays
Journal of Systems Architecture: the EUROMICRO Journal
Hardware Acceleration of HMMER on FPGAs
Journal of Signal Processing Systems
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Controller synthesis for mapping partitioned programs on array architectures
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Multi-dimensional kernel generation for loop nest software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
We present two new results of importance in code generation for and synthesis of synchronously scheduled parallel processor arrays and multicluster VLIWs. The first is a new practical method for constructing a linear schedule for the iterations of a loop nest that schedules precisely one iteration per cycle on each of a prescribed set of processors. While this problem goes back to the era in which systolic computation was in vogue, it has defied practical solution until now. We provide a closed form solution that enables the enumeration of all such schedules. The second result is a new technique that reduces the cost of code or hardware whose function is to control the flow of data and predicate operations, and to generate memory addresses. The key idea is that by using the mathematical structure of any of the conflict-free schedules we construct, a very shallow recurrence can be developed to inexpensively update these quantities.