Cache-aware iteration space partitioning

Authors:
Arun Kejariwal;Alexandru Nicolau;Utpal Banerjee;Alexander V. Veidenbaum;Constantine D. Polychronopoulos
Affiliations:
UC Irvine, Irvine, USA;UC, Irvine, Irvine, USA;Intel, Santa Clara, USA;UC, Irvine, Irvine, USA;UIUC, Urbana Champaign, USA
Venue:
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Year:
2008

Citing 13
Cited 2

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Symbolic analysis for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
Analytical Modeling of Set-Associative Cache Behavior

IEEE Transactions on Computers
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
A compiler tool to predict memory hierarchy performance of scientific codes

Parallel Computing
A novel approach for partitioning iteration spaces with variable densities

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Future of Microprocessors

Queue - Multiprocessors
Software and the Concurrency Revolution

Queue - Multiprocessors
A general approach for partitioning N-dimensional parallel nested loops with conditionals

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Proceedings of the 20th annual international conference on Supercomputing
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Frameworks for multi-core architectures: a comprehensive evaluation using 2D/3D image registration

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need for high performance per watt has led to the development of multi-core systems such as the Intel Core 2 Duo processor and the Intel quad-core Kentsfield processor. Maximal exploitation of the hardware parallelism supported by such systems necessitates the development of concurrent software. This, in part, entails program parallelization and efficient mapping of the parallelized program onto the different cores. The latter affects the load balance between the different cores which in turn has a direct impact on performance. In light of the fact that parallel loops, such as a parallel DO loop in Fortran, account for a large percentage of the total execution time, we focus on the problem of how to efficiently partition the iteration space of (possibly) nested perfect/non-perfect parallel loops. In this regard, one of the key aspects is how to efficiently capture the cache behavior as the cache subsystem is often the main performance bottleneck in multi-core systems. In this paper, we present a novel profile-guided compiler technique for cache-aware partitioning of iteration spaces of parallel loops. We present a case study using a kernel from the industry-standard SPEC CPU benchmark suite.