Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Regular interactive algorithms and their implementations on processor arrays
Regular interactive algorithms and their implementations on processor arrays
Theory of linear and integer programming
Theory of linear and integer programming
VLSI array processors
A design methodology for synthesizing parallel algorithms and architectures
Journal of Parallel and Distributed Computing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays
Scheduling, partitioning and mapping of uniform dependence algorithms on processor arrays
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Multiprocessors: discussion of some theoretical and practical problems
Multiprocessors: discussion of some theoretical and practical problems
Optimization and interconnection complexity for: parallel processors, single-stage networks, and decision trees
Analysis of free schedule in periodic graphs
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
ICS '94 Proceedings of the 8th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Computing Programs Containing Band Linear Recurrences on Vector Supercomputers
IEEE Transactions on Parallel and Distributed Systems
Journal of VLSI Signal Processing Systems
High throughput pipelined data path synthesis by conserving the regularity of nested loops
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP
Journal of VLSI Signal Processing Systems
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing
IEEE Transactions on Computers
Automatic Generation of Modular Time-Space Mappings and Data Alignments
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Journal of VLSI Signal Processing Systems
Finding Quadratic Schedules for Affine Recurrence Equations Via Nonsmooth Optimization
Journal of VLSI Signal Processing Systems
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Design of Processor Arrays for Reconfigurable Architectures
The Journal of Supercomputing
Design of Space-Optimal Regular Arrays for Algorithms with Linear Schedules
IEEE Transactions on Computers
On Uniformization of Affine Dependence Algorithms
IEEE Transactions on Computers
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests
IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape
IEEE Transactions on Parallel and Distributed Systems
Generation of Injective and Reversible Modular Mappings
IEEE Transactions on Parallel and Distributed Systems
Automatic generation of injective modular mappings
ICPP '97 Proceedings of the international Conference on Parallel Processing
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A New Loop Partition Method-Clustering
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
An introduction to processor-time-optimal systolic arrays
Highly parallel computaions
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Processor Lower Bound Formulas for Array Computations and Parametric Diophantine Systems
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Journal of Parallel and Distributed Computing
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays
Journal of Parallel and Distributed Computing
On Scheduling Mesh-Structured Computations for Internet-Based Computing
IEEE Transactions on Computers
Efficient implementation of nested-loop multimedia algorithms
EURASIP Journal on Applied Signal Processing
Cronus: A platform for parallel code generation based on computational geometry methods
Journal of Systems and Software
Journal of Parallel and Distributed Computing
On minimizing register usage of linearly scheduled algorithms with uniform dependencies
Computer Languages, Systems and Structures
Geometric scheduling of 2-D UET-UCT uniform dependence loops
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Determining objective functions in systolic array designs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On supernode transformations and multithreading for the longest common subsequence problem
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
A Case Study of Implementing Supernode Transformations
International Journal of Parallel Programming
Hi-index | 15.00 |
The authors address the problem of identifying optimal linear schedules for uniform dependence algorithms so that their execution time is minimized. Procedures are proposed to solve this problem based on the mathematical solution of a nonlinear optimization problem. The complexity of these procedures is independent of the size of the algorithm. Actually, the complexity is exponential in the dimension of the index set of the algorithm, and for all practical purposes, very small due to the limited dimension of the index set of algorithms of practical interest. A particular class of algorithms for which the proposed solution is greatly simplified is considered, and the corresponding simpler organization procedure is provided.