Optimistic parallelism benefits from data partitioning

Authors:
Milind Kulkarni;Keshav Pingali;Ganesh Ramanarayanan;Bruce Walter;Kavita Bala;L. Paul Chew
Affiliations:
The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY;Cornell University, Ithaca, NY
Venue:
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Year:
2008

Citing 27
Cited 25

Detecting conflicts between structure accesses

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A new approach to the maximum-flow problem

Journal of the ACM (JACM)
Dependence analysis for pointer variables

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A bridging model for parallel computation

Communications of the ACM
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Extending high performance Fortran for the support of unstructured computations

ICS '95 Proceedings of the 9th international conference on Supercomputing
Solving shape-analysis problems in languages with destructive updating

ACM Transactions on Programming Languages and Systems (TOPLAS)
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Introduction to Algorithms

Introduction to Algorithms
Parallelizing Programs with Recursive Data Structures

IEEE Transactions on Parallel and Distributed Systems
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator

FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
S-HARP: A Parallel Dynamic Spectral Partitioner (A short summary)

IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
Language support for lightweight transactions

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Lightcuts: a scalable approach to illumination

ACM SIGGRAPH 2005 Papers
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Graph Cuts and Efficient N-D Image Segmentation

International Journal of Computer Vision
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Open nesting in software transactional memory

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Sparse parallel Delaunay mesh refinement

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Delaunay Triangulation with Transactions and Barriers

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization

Scheduling strategies for optimistic parallel execution of irregular programs

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
On the Scalability of an Automatically Parallelized Irregular Application

Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Kendo: efficient deterministic multithreading in software

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Optimistic parallelism requires abstractions

Communications of the ACM - The Status of the P versus NP Problem
Parallel programming with object assemblies

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Structure-driven optimizations for amorphous data-parallel programs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speculative parallelization of sequential loops on multicores

International Journal of Parallel Programming
Supporting speculative parallelization in the presence of dynamic data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Brief announcement: locality-aware load balancing for speculatively-parallelized irregular applications

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Parallel inclusion-based points-to analysis

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
SpiceC: scalable parallelism via implicit copying and explicit commit

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting the commutativity lattice

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting coarse-grain speculative parallelism

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
PLDS: Partitioning linked data structures for parallelism

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Effective parallelization of loops in the presence of I/O operations

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Support for thread-level speculation into OpenMP

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Legion: expressing locality and independence with logical regions

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations. The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors. In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.