The tao of parallelism in algorithms

Authors:
Keshav Pingali;Donald Nguyen;Milind Kulkarni;Martin Burtscher;M. Amber Hassaan;Rashid Kaleem;Tsung-Hsien Lee;Andrew Lenharth;Roman Manevich;Mario Méndez-Lojo;Dimitrios Prountzos;Xin Sui
Affiliations:
The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;Purdue University, West Lafayette, IN, USA;Texas State University--San Marcos, San Marcos, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA
Venue:
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Year:
2011

Citing 58
Cited 28

Efficient demand-driven evaluation. Part 1

ACM Transactions on Programming Languages and Systems (TOPLAS) - Lecture notes in computer science Vol. 174
Virtual time

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Distributed discrete-event simulation

ACM Computing Surveys (CSUR)
Programming with sets; an introduction to SETL

Programming with sets; an introduction to SETL
A simple parallel algorithm for the maximal independent set problem

SIAM Journal on Computing
Dependence analysis for pointer variables

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
An introduction to parallel algorithms

An introduction to parallel algorithms
Highly parallel sparse Cholesky factorization

SIAM Journal on Scientific and Statistical Computing
A comprehensive approach to parallel data flow analysis

ICS '92 Proceedings of the 6th international conference on Supercomputing
Parallel and distributed derivations in the single-pushout approach

Theoretical Computer Science - Special issue on selected papers of the International Workshop on Computing by Graph Transformation, Bordeaux, France, March 21–23, 1991
Guaranteed-quality mesh generation for curved surfaces

SCG '93 Proceedings of the ninth annual symposium on Computational geometry
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity

Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming parallel algorithms

Communications of the ACM
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Rendering complex scenes with memory-coherent ray tracing

Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Speculative Versioning Cache

IEEE Transactions on Parallel and Distributed Systems
Algorithms + Data Structures = Programs

Algorithms + Data Structures = Programs
A Discipline of Programming

A Discipline of Programming
Introduction to Algorithms

Introduction to Algorithms
Distributed Memory Compiler Design For Sparse Problems

IEEE Transactions on Computers
Parallelizing Programs with Recursive Data Structures

IEEE Transactions on Parallel and Distributed Systems
The Combining DAG: A Technique for Parallel Data Flow Analysis

IEEE Transactions on Parallel and Distributed Systems
Efficient Parallel Algorithms for 2-Dimensional Ising Spin Models

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Algebraic Approach to Graph Transformation Based on Single Pushout Derivations

WG '90 Proceedings of the 16rd International Workshop on Graph-Theoretic Concepts in Computer Science
A parallel solution strategy for irregular, dynamic problems

A parallel solution strategy for irregular, dynamic problems
Fractal symbolic analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Language support for lightweight transactions

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Network Analysis: Methodological Foundations (Lecture Notes in Computer Science)

Network Analysis: Methodological Foundations (Lecture Notes in Computer Science)
Lifting sequential graph algorithms for distributed-memory parallel computation

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Engineering a compact parallel delaunay algorithm in 3D

Proceedings of the twenty-second annual symposium on Computational geometry
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs

Journal of Parallel and Distributed Computing
Sparse parallel Delaunay mesh refinement

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Transactional boosting: a methodology for highly-concurrent transactional objects

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Patterns for parallel programming

Patterns for parallel programming
How much parallelism is there in irregular applications?

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Structure-driven optimizations for amorphous data-parallel programs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
STAPL: an adaptive, generic parallel C++ library

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Parallel inclusion-based points-to analysis

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A shape analysis for optimizing parallel graph programs

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Synthesizing concurrent schedulers for irregular algorithms

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Ghost Cell Pattern

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Exploiting the commutativity lattice

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation

Brief announcement: processor allocation for optimistic parallelization of irregular programs

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Safe parallel programming using dynamic dependence hints

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Resource-sensitive synchronization inference by abduction

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Shared work list: hacking amorphous data parallelism in UPC

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A GPU implementation of inclusion-based points-to analysis

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Internally deterministic parallel algorithms can be fast

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
TransMR: data-centric programming beyond data parallelism

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Studying the impact of application-level optimizations on the power consumption of multi-core architectures

Proceedings of the 9th conference on Computing Frontiers
Introducing ScaleGraph: an X10 library for billion scale graph analytics

Proceedings of the 2012 ACM SIGPLAN X10 Workshop
Brief announcement: the problem based benchmark suite

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
HARPPIE: hyper algorithmic recipe for productive parallelism intensive endeavors

Proceedings of the 34th International Conference on Software Engineering
Processor allocation for optimistic parallelization of irregular programs

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Speculative parallel asynchronous contact mechanics

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Elixir: a system for synthesizing concurrent graph programs

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Layout-oblivious compiler optimization for matrix computations

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Algorithmic species: A classification of affine loop nests for parallel programming

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Betweenness centrality: algorithms and implementations

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Ligra: a lightweight graph processing framework for shared memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Morph algorithms on GPUs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
The von Neumann architecture is due for retirement

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Tightfit: adaptive parallelization with foresight

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Proof-Directed Parallelization Synthesis by Separation Logic

ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Parallel flow-sensitive pointer analysis by graph-rewriting

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DANBI: dynamic scheduling of irregular stream programs for many-core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Deterministic galois: on-demand, portable and parameterless

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets. To address the need for better abstractions, we introduce a data-centric formulation of algorithms called the operator formulation in which an algorithm is expressed in terms of its action on data structures. This formulation is the basis for a structural analysis of algorithms that we call tao-analysis. Tao-analysis can be viewed as an abstraction of algorithms that distills out algorithmic properties important for parallelization. It reveals that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous in algorithms, and that, depending on the tao-structure of the algorithm, this parallelism may be exploited by compile-time, inspector-executor or optimistic parallelization, thereby unifying these seemingly unrelated parallelization techniques. Regular algorithms emerge as a special case of irregular algorithms, and many application-specific optimization techniques can be generalized to a broader context. These results suggest that the operator formulation and tao-analysis of algorithms can be the foundation of a systematic approach to parallel programming.