Cache-optimal methods for bit-reversals
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Cacheminer: A Runtime Approach to Exploit Cache Locality on SMP
IEEE Transactions on Parallel and Distributed Systems
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences
IEEE Transactions on Parallel and Distributed Systems
Array organization in parallel memories
International Journal of Parallel Programming
Hi-index | 0.00 |
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout oriented approach to exploit cache locality for parallel loops at run-time on Symmetric Multi-Processor (SMP) systems. Guided by application dependent hints and the targeted cache architecture, it reorganizes and partitions a parallel loop through shrinking and partitioning the memory access space of the loop at run-time. In the generated task partitions, the data sharing among partitions is minimized and data reuse in a partition is maximized. The execution of tasks in partitions is scheduled in an adaptive and locality-preserved way to achieve balanced execution, for minimizing the execution time of applications by trading off load balance and locality.Based on simulation and measurement, we show our run-time approach can achieve comparable performance with the compiler optimizations for two applications, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results also show that our approach is able to significantly improve the memory performance for the applications with dynamic memory access patterns. This type of programs are usually hard to be optimized by compilers.