The implementation of the Cilk-5 multithreaded language

Authors:
Matteo Frigo;Charles E. Leiserson;Keith H. Randall
Affiliations:
MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, Massachusetts;MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, Massachusetts;MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, Massachusetts
Venue:
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Year:
1998

Citing 20
Cited 267

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
VLSI Support for a cactus stack oriented memory organization

Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
A bridging model for parallel computation

Communications of the ACM
Introduction to algorithms

Introduction to algorithms
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Polling efficiently on stock hardware

FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Whole-program optimization for time and space efficient threads

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The cilk system for parallel multithreaded computing

The cilk system for parallel multithreaded computing
Lazy threads: implementing a fast parallel call

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Executing multithreaded programs efficiently

Executing multithreaded programs efficiently
Efficient detection of determinacy races in Cilk programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Detecting data races in Cilk programs that use locks

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
Solution of a problem in concurrent programming control

Communications of the ACM
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Parallel Symbolic Computing in Cid

PSLS '95 Proceedings of the International Workshop on Parallel Symbolic Languages and Systems
Garbage Collection is Fast, but a Stack is Faster

Garbage Collection is Fast, but a Stack is Faster
The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem

The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem

Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Detecting data races in Cilk programs that use locks

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Efficient large-scale process-oriented parallel simulations

Proceedings of the 30th conference on Winter simulation
StackThreads/MP: integrating futures into calling standards

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Locality-preserving load-balancing mechanisms for synchronous simulations on shared-memory multiprocessors

PADS '00 Proceedings of the fourteenth workshop on Parallel and distributed simulation
A Java fork/join framework

Proceedings of the ACM 2000 conference on Java Grande
Symbolic bounds analysis of pointers, array indices, and accessed memory regions

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Pthreads for dynamic and irregular parallelism

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel randomized best-first minimax search

Artificial Intelligence
Pointer analysis for structured parallel programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Suboptimal Minimum Cluster Volume Cover-Based Method for Measuring Fractal Dimension

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating the XMT Parallel Programming Model

HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky

ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Next Generation System Software for Future High-End Computing Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Production Job Scheduling for Parallel Shared Memory Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Online Computation of Critical Paths for Multithreaded Languages

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Recursion Unrolling for Divide and Conquer Programs

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Fusion of Concurrent Invocations of Exclusive Methods

PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
A Case Study of Load Distribution in Parallel View Frustum Culling and Collision Detection

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Transparent Operating System Infrastructure for Embedding Adaptability to Thread-Based Programming Models

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Parallel Computation: MM +/- X

Informatics - 10 Years Back. 10 Years Ahead.
Design-Driven Compilation

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic evaluation: an adaptive evaluation strategy for non-strict programs

ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
SilkRoad II: mixed paradigm cluster computing with RC_dag consistency

Parallel Computing
A comparative analysis of fine-grain threads packages

Journal of Parallel and Distributed Computing
Run-Time Support for the Automatic Parallelization of Java Programs

The Journal of Supercomputing
A fast Fourier transform compiler

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Parallel and fully recursive multifrontal sparse Cholesky

Future Generation Computer Systems - Special issue: Selected numerical algorithms
On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Transparent proxies for java futures

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Adding parallelism to visual data flow programs

SoftVis '05 Proceedings of the 2005 ACM symposium on Software visualization
Symbolic bounds analysis of pointers, array indices, and accessed memory regions

ACM Transactions on Programming Languages and Systems (TOPLAS)
A generic approach to parallel chart parsing with an application to LinGO

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Adaptive scheduling with parallelism feedback

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
SmartApps: middle-ware for adaptive applications on reconfigurable platforms

ACM SIGOPS Operating Systems Review
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Programming with exceptions in JCilk

Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Adaptive work stealing with parallelism feedback

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling DAGs on asynchronous processors

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Iterative context bounding for systematic testing of multithreaded programs

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Multithreaded programming in Cilk

Proceedings of the 2007 international workshop on Parallel symbolic computation
Adaptive loops with kaapi on multicore and grid: applications in symmetric cryptography

Proceedings of the 2007 international workshop on Parallel symbolic computation
Probabilistic certification of divide & conquer algorithms on global computing platforms: application to fault-tolerant exact matrix-vector product

Proceedings of the 2007 international workshop on Parallel symbolic computation
A formal model of a system for automated program parallelization

Programming and Computing Software
Supporting exception handling for futures in Java

Proceedings of the 5th international symposium on Principles and practice of programming in Java
Parallel unsymmetric-pattern multifrontal sparse LU with column preordering

ACM Transactions on Mathematical Software (TOMS)
Streamware: programming general-purpose multicore processors using streams

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Measuring and Evaluating Parallel State-Space Exploration Algorithms

Electronic Notes in Theoretical Computer Science (ENTCS)
Fair stateless model checking

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Parallelization, performance analysis, and algorithm consideration of Hough transform on chip multiprocessors

ACM SIGARCH Computer Architecture News
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
An adaptive cut-off for task parallelism

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Fine Grain Distributed Implementation of a Dataflow Language with Provable Performances

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
A Proposal for Task Parallelism in OpenMP

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
An Efficient OpenMP Runtime System for Hierarchical Architectures

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Deque-Free Work-Optimal Parallel STL Algorithms

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
An Experimental Evaluation of the New OpenMP Tasking Model

Languages and Compilers for Parallel Computing
Low-pain, high-gain multicore programming in Haskell: coordinating irregular symbolic computations on multicore architectures

Proceedings of the 4th workshop on Declarative aspects of multicore programming
gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments

Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Backtracking-based load balancing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Serialization sets: a dynamic dependence-based parallel execution model

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural support for cilk computations on many-core architectures

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CellSs: Scheduling techniques to better exploit memory hierarchy

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Kendo: efficient deterministic multithreading in software

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
As-if-serial exception handling semantics for Java futures

Science of Computer Programming
A Unified Runtime System for Heterogeneous Multi-core Architectures

Euro-Par 2008 Workshops - Parallel Processing
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Wool-A work stealing library

ACM SIGARCH Computer Architecture News
Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Hierarchical Task-Based Programming With StarSs

International Journal of High Performance Computing Applications
Reducers and other Cilk++ hyperobjects

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Brief announcement: a lower bound for depth-restricted work stealing

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Experience with SC: transformation-based implementation of various extensions to C

Proceedings of the 2007 International Lisp Conference
Flexible filters: load balancing through backpressure for stream programs

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
The Cilk++ concurrency platform

Proceedings of the 46th Annual Design Automation Conference
The design of a task parallel library

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Autotuning multigrid with PetaBricks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PFunc: modern task parallelism for modern high performance computing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Helper locks for fork-join parallel programming

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Performance Evaluation of Work Stealing for Streaming Applications

OPODIS '09 Proceedings of the 13th International Conference on Principles of Distributed Systems
Race-free and memory-safe multithreading: design and implementation in cyclone

Proceedings of the 5th ACM SIGPLAN workshop on Types in language design and implementation
Lightweight asynchrony using parasitic threads

Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Satin: A high-level and efficient grid programming model

ACM Transactions on Programming Languages and Systems (TOPLAS)
A randomized scheduler with probabilistic guarantees of finding bugs

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Cilk++ concurrency platform

The Journal of Supercomputing
An approach for non-intrusively adding malleable fork/join parallelism into ordinary JavaBean compliant applications

Computer Languages, Systems and Structures
UTS: an unbalanced tree search benchmark

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
STAPL: an adaptive, generic parallel C++ library

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Parallelising symbolic state-space generators

CAV'07 Proceedings of the 19th international conference on Computer aided verification
An adaptive task creation strategy for work-stealing scheduling

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Memory models: a case for rethinking parallel languages and hardware

Communications of the ACM
Evaluation of OpenMP task scheduling strategies

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Extending the OpenMP tasking model to allow dependent tasks

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Scheduling dynamic OpenMP applications over multicore architectures

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Bamboo: a data-centric, object-oriented approach to many-core software

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The Cilkview scalability analyzer

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: serial-parallel reciprocity in dynamic multithreaded languages

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Online mapping of MPI-2 dynamic tasks to processes and threads

International Journal of High Performance Systems Architecture
A mean field model of work stealing in large-scale systems

Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
STAPL: standard template adaptive parallel library

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Cohesion: a hybrid memory model for accelerators

Proceedings of the 37th annual international symposium on Computer architecture
Balanced dense polynomial multiplication on multi-cores

ACM Communications in Computer Algebra
Parallel computation of the minimal elements of a poset

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Hardware/software support for adaptive work-stealing in on-chip multiprocessor

Journal of Systems Architecture: the EUROMICRO Journal
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using memory mapping to support cactus stacks in work-stealing runtime systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Gossamer: a lightweight programming framework for multicore machines

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Lazy tree splitting

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Concurrent programming with revisions and isolation types

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Concurrency by modularity: design patterns, a case in point

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A generic platform for estimation of multi-threaded program performance on heterogeneous multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Building scalable software systems in the multicore era

Proceedings of the FSE/SDP workshop on Future of software engineering research
Multi-GPU and multi-CPU parallelization for interactive physics simulations

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Hierarchical work-stealing

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Hierarchical multithreading: programming model and system software

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Comparing the usability of library vs. language approaches to task parallelism

Evaluation and Usability of Programming Languages and Tools
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A stream-computing extension to OpenMP

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Three layer cake for shared-memory programming

Proceedings of the 2010 Workshop on Parallel Programming Patterns
Space profiling for parallel functional programs

Journal of Functional Programming
Implicitly threaded parallelism in manticore

Journal of Functional Programming
Semantics of concurrent revisions

ESOP'11/ETAPS'11 Proceedings of the 20th European conference on Programming languages and systems: part of the joint European conferences on theory and practice of software
Scheduling task parallelism on multi-socket multicore systems

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Location-based memory fences

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
MDR: performance model driven runtime for heterogeneous parallel platforms

Proceedings of the international conference on Supercomputing
Unbalanced tree search on a manycore system using the GPI programming model

Computer Science - Research and Development
Experiments with the Fresh Breeze tree-based memory model

Computer Science - Research and Development
Pervasive parallelism for managed runtimes

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Parallel programming of general-purpose programs using task-based programming models

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Manycore work stealing

Proceedings of the 8th ACM International Conference on Computing Frontiers
A runtime implementation of OpenMP tasks

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Work stealing for multi-core HPC clusters

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Hardware and software tradeoffs for task synchronization on manycore architectures

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Combining RTSJ with Fork/Join: a priority-based model

Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems
Safe parallel programming using dynamic dependence hints

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Enhancing locality for recursive traversals of recursive structures

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Oracle scheduling: controlling granularity in implicitly parallel languages

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
AC: composable asynchronous IO for native languages

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
A parallel programming model for ada

SIGAda '11 Proceedings of the 2011 ACM annual international conference on Special interest group on the ada programming language
Periodic hierarchical load balancing for large supercomputers

International Journal of High Performance Computing Applications
Implementation of a hierarchical N-body simulator using the Ompss programming model

Proceedings of the first workshop on Irregular applications: architectures and algorithm
Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Factory: an object-oriented parallel programming substrate for deep multiprocessors

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Adaptive encoding of multimedia streams on MPSoC

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
FFT-based dense polynomial arithmetic on multi-cores

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Cooperative parallelization

Proceedings of the International Conference on Computer-Aided Design
On-line adaptive parallel prefix computation

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Massively parallel breadth first search using a tree-structured memory model

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Shared work list: hacking amorphous data parallelism in UPC

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Performance of parallel bit-reversal with cilk and UPC for fast fourier transform

GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Deterministic parallel random-number generation for dynamic-multithreading platforms

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A work-stealing scheduler for X10's task parallelism with suspension

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Aikido: accelerating shared data dynamic analyses

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Dataflow execution of sequential imperative programs on multicore architectures

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
An efficient and flexible task management for many cores

Transactions on High-Performance Embedded Architectures and Compilers IV
Support for OpenMP tasks on cell architecture

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Lightweight lexical closures for legitimate execution stack access

CC'06 Proceedings of the 15th international conference on Compiler Construction
Extendable pattern-oriented optimization directives

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A performance model for X10 applications: what's going on under the hood?

Proceedings of the 2011 ACM SIGPLAN X10 Workshop
DAG3: a tool for design and analysis of applications for multicore architectures

Proceedings of the 27th Annual ACM Symposium on Applied Computing
OpenMP task scheduling strategies for multicore NUMA systems

International Journal of High Performance Computing Applications
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Scalable and precise dynamic datarace detection for structured parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces

Proceedings of the 2012 international symposium on Memory Management
Data-driven fault tolerance for work stealing computations

Proceedings of the 26th ACM international conference on Supercomputing
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures

Proceedings of the 26th ACM international conference on Supercomputing
WSCOM: Online Task Scheduling with Data Transfers

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Design, verification and applications of a new read-write lock algorithm

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Memory-mapping support for reducer hyperobjects

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
The communication complexity of distributed task allocation

PODC '12 Proceedings of the 2012 ACM symposium on Principles of distributed computing
For extreme parallelism, your OS is Sooooo last-millennium

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Processor allocation for optimistic parallelization of irregular programs

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Performance driven cooperation between kernel and auto-tuning multi-threaded interval b&b applications

ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Extendable pattern-oriented optimization directives

ACM Transactions on Architecture and Code Optimization (TACO)
Haskell vs. f# vs. scala: a high-level language features and parallelism support comparison

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Server-based scheduling of parallel real-time tasks

Proceedings of the tenth ACM international conference on Embedded software
Work-stealing without the baggage

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Software data-triggered threads

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Interactive physical simulation on multicore architectures

EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Characterizing and mitigating work time inflation in task parallel programs

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Fast asymmetric thread synchronization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Compiler support for lightweight context switching

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Transactional access to shared memory in starss, a task based programming model

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Programming support and scheduling for communicating parallel tasks

Journal of Parallel and Distributed Computing
Efficient data race detection for async-finish parallelism

Formal Methods in System Design
Variable permissions for concurrency verification

ICFEM'12 Proceedings of the 14th international conference on Formal Engineering Methods: formal methods and software engineering
StreamTMC: Stream compilation for tiled multi-core architectures

Journal of Parallel and Distributed Computing
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring heterogeneous scheduling using the task-centric programming model

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Checking and enforcing robustness against TSO

ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Computational sprinting on a hardware/software testbed

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Portable performance on heterogeneous architectures

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A proper performance evaluation system that summarizes code placement effects

Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Scalanytics: a declarative multi-core platform for scalable composable traffic analytics

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Holistic run-time parallelism management for time and energy efficiency

Proceedings of the 27th international ACM conference on International conference on supercomputing
Hybrid parallel task placement in X10

Proceedings of the third ACM SIGPLAN X10 Workshop
A divide and conquer approach and a work-optimal parallel algorithm for the LIS problem

Information Processing Letters
A work-stealing scheduling framework supporting fault tolerance

Proceedings of the Conference on Design, Automation and Test in Europe
ARTM: a lightweight fork-join framework for many-core embedded systems

Proceedings of the Conference on Design, Automation and Test in Europe
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture
Interference resilient PDES on multi-core systems: towards proportional slowdown

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
On-the-fly pipeline parallelism

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Locality-aware task management for unstructured parallelism: a quantitative limit study

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

Proceedings of the High Performance Computing Symposium
LVars: lattice-based data structures for deterministic parallelism

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Deterministic scale-free pipeline parallelism with hyperqueues

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Online feedback-directed optimizations for parallel Java code

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
PICCO: a general-purpose compiler for private distributed computation

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DANBI: dynamic scheduling of irregular stream programs for many-core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Real-time programming on accelerator many-core processors

Proceedings of the 2013 ACM SIGAda annual conference on High integrity language technology
Flexible filters in stream programs

ACM Transactions on Embedded Computing Systems (TECS)
Efficient multiprogramming for multicores with SCAF

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient work-stealing language runtimes

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Ada and many-core platforms

ACM SIGAda Ada Letters
Well-structured futures and cache locality

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Concurrency testing using schedule bounding: an empirical study

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Analysis of dependence tracking algorithms for task dataflow execution

ACM Transactions on Architecture and Code Optimization (TACO)
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores
Static safety guarantees for a low-level multithreaded language with regions

Science of Computer Programming
GLB: lifeline-based global load balancing library in x10

Proceedings of the first workshop on Parallel programming for analytics applications
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Characterizing and mitigating work time inflation in task parallel programs

Scientific Programming - Selected Papers from Super Computing 2012
Combined scheduling and mapping for scalable computing with parallel tasks

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.02

Visualization

Abstract

The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design of Cilk-5's compiler and its runtime system. In particular, we present Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler.