A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Digital Image Processing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Restructuring computations for temporal data cache locality
International Journal of Parallel Programming
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Combinatorial Optimization: Theory and Algorithms
Combinatorial Optimization: Theory and Algorithms
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Hi-index | 0.00 |
This paper is devoted to the problem of estimating the achievable degree of parallelism for a parallel algorithm with respect to a bandwidth constraint. In a compiler chain for embedded parallel microprocessors such an estimation can be used to fix an appropriate target for parallelism reduction "tools". Informally, our problem consists in task ordering and memory management for an algorithm, so as to minimize the number of memory accesses. After a brief survey of the literature, we prove the NP-hardness of this problem and introduce a polynomial special case. We then present a branch and bound procedure for the general case along with computational results interpretation demonstrating its practical relevance.