Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Computing size-independent matrix problems on systolic array processors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A bridging model for parallel computation
Communications of the ACM
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Evaluating compiler optimizations for Fortran D
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Integration, the VLSI Journal
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal orthogonal tiling of 2-D iterations
Journal of Parallel and Distributed Computing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Pipelined Data Parallel Algorithms-I: Concept and Modeling
IEEE Transactions on Parallel and Distributed Systems
Pipelined Data Parallel Algorithms-II: Design
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
Tiling and Processors Allocation for Three Dimensional Iteration Space
HiPC '99 Proceedings of the 6th International Conference on High Performance Computing
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Precise Tiling for Uniform Loop Nests
ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Predicting performance for tiled perfectly nested loops
Predicting performance for tiled perfectly nested loops
On tiling space-time mapped loop nests
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimal tiling for the RNA base pairing problem
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Towards the automatic optimal mapping of pipeline algorithms
Parallel Computing
DPSKEL: a skeleton based tool for parallel dynamic programming
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Hierarchical overlapped tiling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
A framework for the application of metaheuristics to tasks-to-processors assignation problems
The Journal of Supercomputing
Skeletal based programming for dynamic programming on MultiGPU systems
The Journal of Supercomputing
Hi-index | 0.00 |
For 2-D iteration space tiling, we address the problem of determining the tile parameters that minimize the total execution time under the BSP model. We consider uniform dependency computations, tiled so that (at least) one of the tile boundaries is parallel to the domain boundary. We determine the optimal tile size as a closed form solution. In addition, we determine the optimal number of processors and also the optimal slope of the oblique tile boundary.Our predictions are validated, among other examples, on a sequence alignment problem specialized to similar sequences using Ficket's “k-band” algorithm, for which, our optimal semi-oblique tiling yields an improvement over orthogonal tiling by a factor of 2.5. Our optimal solution requires a block-cyclic distribution of tiles to processors. The best one can obtain with only block distribution (as many authors require) is 3 times slower.