A new approach to the maximum-flow problem
Journal of the ACM (JACM)
Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Guaranteed-quality mesh generation for curved surfaces
SCG '93 Proceedings of the ninth annual symposium on Computational geometry
The high performance Fortran handbook
The high performance Fortran handbook
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Solving shape-analysis problems in languages with destructive updating
ACM Transactions on Programming Languages and Systems (TOPLAS)
IEEE Transactions on Parallel and Distributed Systems
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Parallelizing Programs with Recursive Data Structures
IEEE Transactions on Parallel and Distributed Systems
Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
Lightcuts: a scalable approach to illumination
ACM SIGGRAPH 2005 Papers
Advanced contention management for dynamic software transactional memory
Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Graph Cuts and Efficient N-D Image Segmentation
International Journal of Computer Vision
Transactional Memory (Synthesis Lectures on Computer Architecture)
Transactional Memory (Synthesis Lectures on Computer Architecture)
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Speculative Decoupled Software Pipelining
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Optimistic parallelism benefits from data partitioning
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
On the Scalability of an Automatically Parallelized Irregular Application
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Practice of parallelizing network applications on multi-core architectures
Proceedings of the 23rd international conference on Supercomputing
Optimistic parallelism requires abstractions
Communications of the ACM - The Status of the P versus NP Problem
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A shape analysis for optimizing parallel graph programs
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synthesizing concurrent schedulers for irregular algorithms
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Synchronization via scheduling: techniques for efficiently managing shared state
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Partool: a feedback-directed parallelizer
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
High quality real-time image-to-mesh conversion for finite element simulations
Proceedings of the 27th international ACM conference on International conference on supercomputing
High quality real-time Image-to-Mesh conversion for finite element simulations
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Recent application studies have shown that many irregular applications have a generalized data parallelism that manifests itself as iterative computations over worklists of different kinds. In general, there are complex dependencies between iterations. These dependencies cannot be elucidated statically because they depend on the inputs to the program; thus, optimistic parallel execution is the only tractable approach to parallelizing these applications. We have built a system called Galois that supports this style of parallel execution. Its main features are (i) set iterators for expressing worklist-based data parallelism, and (ii) a runtime system that performs optimistic parallelization of these iterators, detecting conflicts and rolling back computations as needed. Our work builds on the Galois system, and it addresses the problem of scheduling iterations of set iterators on multiple cores. The policy used by the base Galois system is to assign an iteration to a core whenever it needs work to do, but we show in this paper that this policy is not optimal for many applications. We also argue that OpenMP-style DO-ALL loop scheduling directives such as chunked and guided self-scheduling are too simplistic for irregular programs. These difficulties led us to develop a general scheduling framework for irregular problems; OpenMP-style scheduling strategies are special cases of this general approach. We also provide hooks into our framework, allowing the programmer to leverage application knowledge to further tune a schedule for a particular application. To evaluate this framework, we implemented it as an extension of the Galois system. We then tested the system using five real-world, irregular, data-parallel applications. Our results show that (i) the optimal scheduling policy can be different for different applications and often leverages application-specific knowledge and (ii) implementing these schedules in the Galois system is relatively straightforward.