Scheduling strategies for optimistic parallel execution of irregular programs

Authors:
Milind Kulkarni;Patrick Carribault;Keshav Pingali;Ganesh Ramanarayanan;Bruce Walter;Kavita Bala;L. Paul Chew
Affiliations:
The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Year:
2008

Citing 20
Cited 13

A new approach to the maximum-flow problem

Journal of the ACM (JACM)
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Guaranteed-quality mesh generation for curved surfaces

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
The high performance Fortran handbook

The high performance Fortran handbook
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Solving shape-analysis problems in languages with destructive updating

ACM Transactions on Programming Languages and Systems (TOPLAS)
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Parallelizing Programs with Recursive Data Structures

IEEE Transactions on Parallel and Distributed Systems
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator

FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Lightcuts: a scalable approach to illumination

ACM SIGGRAPH 2005 Papers
Advanced contention management for dynamic software transactional memory

Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Graph Cuts and Efficient N-D Image Segmentation

International Journal of Computer Vision
Transactional Memory (Synthesis Lectures on Computer Architecture)

Transactional Memory (Synthesis Lectures on Computer Architecture)
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Optimistic parallelism benefits from data partitioning

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems

On the Scalability of an Automatically Parallelized Irregular Application

Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Practice of parallelizing network applications on multi-core architectures

Proceedings of the 23rd international conference on Supercomputing
Optimistic parallelism requires abstractions

Communications of the ACM - The Status of the P versus NP Problem
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A shape analysis for optimizing parallel graph programs

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synthesizing concurrent schedulers for irregular algorithms

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Synchronization via scheduling: techniques for efficiently managing shared state

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Partool: a feedback-directed parallelizer

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
To inclusive design through contextually extended IoC: infusion IoC, a JavaScript library and mentality for scalable development of accessible and maintainable systems

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
High quality real-time image-to-mesh conversion for finite element simulations

Proceedings of the 27th international ACM conference on International conference on supercomputing
High quality real-time Image-to-Mesh conversion for finite element simulations

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent application studies have shown that many irregular applications have a generalized data parallelism that manifests itself as iterative computations over worklists of different kinds. In general, there are complex dependencies between iterations. These dependencies cannot be elucidated statically because they depend on the inputs to the program; thus, optimistic parallel execution is the only tractable approach to parallelizing these applications. We have built a system called Galois that supports this style of parallel execution. Its main features are (i) set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Our work builds on the Galois system, and it addresses the problem of scheduling iterations of set iterators on multiple cores. The policy used by the base Galois system is to assign an iteration to a core whenever it needs work to do, but we show in this paper that this policy is not optimal for many applications. We also argue that OpenMP-style DO-ALL loop scheduling directives such as chunked and guided self-scheduling are too simplistic for irregular programs. These difficulties led us to develop a general scheduling framework for irregular problems; OpenMP-style scheduling strategies are special cases of this general approach. We also provide hooks into our framework, allowing the programmer to leverage application knowledge to further tune a schedule for a particular application. To evaluate this framework, we implemented it as an extension of the Galois system. We then tested the system using five real-world, irregular, data-parallel applications. Our results show that (i) the optimal scheduling policy can be different for different applications and often leverages application-specific knowledge and (ii) implementing these schedules in the Galois system is relatively straightforward.