Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture

Authors:
Guangming Tan;Vugranam C. Sreedhar;Guang R. Gao
Affiliations:
Department of Electrical and Computer Engineering, University of Delaware, and Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Science,;IBM T. J. Watson Research Center, USA;Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Science,
Venue:
Languages and Compilers for Parallel Computing
Year:
2008

Citing 15
Cited 2

Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic speculative precomputation

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Hybrid technology multithreaded architecture

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip

Proceedings of the 3rd conference on Computing frontiers
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Proceedings of the 34th annual international symposium on Computer architecture
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Accelerating and Adapting Precomputation Threads for Effcient Prefetching

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture

Analysis and performance results of computing betweenness centrality on IBM Cyclops64

The Journal of Supercomputing
Using a "codelet" program execution model for exascale machines: position paper

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new technique to optimize locality of irregular programs by leveraging parallelism on a massive many-core architecture --- IBM Cyclops64 (C64). The key idea is to achieve Just-In-Time Locality which ensures that data are available locally for computation to use. The proposed percolation model for Just-In-Time Locality moves data proactively close to the computation and organizes the data layout such that locality is exploited effectively. The percolation model opens a door for exploiting locality through parallelism, which is an advantage of the future many-core architecture. We implemented the percolation strategy in the context of two irregular applications on C64. Our experimental results are very encouraging and we get an order of magnitude improvement in performance of irregular applications. We also drastically improve the scalability of the applications that we studied.