Detecting conflicts between structure accesses
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A new approach to the maximum-flow problem
Journal of the ACM (JACM)
Dependence analysis for pointer variables
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A bridging model for parallel computation
Communications of the ACM
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Extending high performance Fortran for the support of unstructured computations
ICS '95 Proceedings of the 9th international conference on Supercomputing
Solving shape-analysis problems in languages with destructive updating
ACM Transactions on Programming Languages and Systems (TOPLAS)
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Introduction to Algorithms
Parallelizing Programs with Recursive Data Structures
IEEE Transactions on Parallel and Distributed Systems
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
S-HARP: A Parallel Dynamic Spectral Partitioner (A short summary)
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
Language support for lightweight transactions
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Lightcuts: a scalable approach to illumination
ACM SIGGRAPH 2005 Papers
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Graph Cuts and Efficient N-D Image Segmentation
International Journal of Computer Vision
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Open nesting in software transactional memory
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Sparse parallel Delaunay mesh refinement
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Delaunay Triangulation with Transactions and Barriers
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Scheduling strategies for optimistic parallel execution of irregular programs
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
On the Scalability of an Automatically Parallelized Irregular Application
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Optimistic parallelism requires abstractions
Communications of the ACM - The Status of the P versus NP Problem
Parallel programming with object assemblies
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Structure-driven optimizations for amorphous data-parallel programs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Parallel inclusion-based points-to analysis
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
SpiceC: scalable parallelism via implicit copying and explicit commit
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting the commutativity lattice
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting coarse-grain speculative parallelism
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
PLDS: Partitioning linked data structures for parallelism
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Effective parallelization of loops in the presence of I/O operations
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Support for thread-level speculation into OpenMP
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Legion: expressing locality and independence with logical regions
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism arising from the use of iterative algorithms that perform computations on elements of worklists. In some irregular applications, the computations on different elements are independent. In other applications, there may be complex patterns of dependences between these computations. The Galois system was designed to exploit this kind of irregular data parallelism on multicore processors. Its main features are (i) two kinds of set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Detection of conflicts and rolling back iterations requires information from class implementors. In this paper, we introduce mechanisms to improve the execution efficiency of Galois programs: data partitioning, data-centric work assignment, lock coarsening, and over-decomposition. These mechanisms can be used to exploit locality of reference, reduce mis-speculation, and lower synchronization overhead. We also argue that the design of the Galois system permits these mechanisms to be used with relatively little modification to the user code. Finally, we present experimental results that demonstrate the utility of these mechanisms.