Region-based parallelization of irregular reductions on explicitly managed memory hierarchies

Authors:
Seonggun Kim;Hwansoo Han;Kwang-Moo Choe
Affiliations:
Department of Computer Science, KAIST, Daejeon, Republic of Korea 305-701;Department of Computer Engineering, Sungkyunkwan University, Suwon, Republic of Korea 440-746;Department of Computer Science, KAIST, Daejeon, Republic of Korea 305-701
Venue:
The Journal of Supercomputing
Year:
2011

Citing 26
Cited 1

Principles of runtime support for parallel processors

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Scatter-Add in Data Parallel Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Data partitioning-based parallel irregular reductions: Research Articles

Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Exploiting Locality for Irregular Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An analytical model of locality-based parallel irregular reductions

Parallel Computing
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

An approximate method for filtering out data dependencies with a sufficiently large distance between memory references

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore architectures are evolving with the promise of extreme performance for the classes of applications that require high performance and large bandwidth of memory. Irregular reduction is one of important computation patterns for many complex scientific applications, and it typically requires high performance and large bandwidth of memory. In this article, we propose region-based parallelization techniques for irregular reductions on multicore architectures with explicitly managed memory hierarchies. Managing memory hierarchy in software requires a lot of programming efforts and tends to be error-prone. The difficulties are even worse for applications with irregular data access patterns. To relieve the burden of memory management from programmers, we develop abstractions, particularly targeted to irregular reduction, for structuring parallel tasks, mapping the parallel tasks to processing units and scheduling data transfers between the memory hierarchies. Our framework employs iteration reordering based on regions of data along with dynamic scheduling of parallel tasks. We experimentally evaluate the effectiveness of our techniques for irregular reduction kernels on the Cell processor embedded in a Sony PlayStation3. Experimental results show the speedups of 8 to 14 on the six available SPEs.