POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Integration, the VLSI Journal
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tiling nested loops into maximal rectangular blocks
Journal of Parallel and Distributed Computing
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Precise Tiling for Uniform Loop Nests
ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallelizing query optimization
Proceedings of the VLDB Endowment
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Parallel loop generation and scheduling
The Journal of Supercomputing
Loop parallelization in multi-dimensional cartesian space
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
Hi-index | 0.02 |
In this paper, an efficient algorithm to implement loop partitioning is introduced and evaluated. We start from results of Agarwal et al. whose aim is to minimize the number of accessed data throughout the computation of a tile; this number is called the cumulative footprint of the tile. We improve these results along several directions. First, we derive a new formulation of the cumulative footprint, allowing for an analytical solution of the optimization problem stated in. Second, we deal with arbitrary parallelepiped-shaped tiles, as opposed to rectangular tiles in. We design an efficient heuristic to determine the optimal tile shape in this general setting and we show its usefulness using both examples from and a large collection of randomly generated data.