MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
VLSI Support for a cactus stack oriented memory organization
Proceedings of the Twenty-First Annual Hawaii International Conference on Architecture Track
A bridging model for parallel computation
Communications of the ACM
Introduction to algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Polling efficiently on stock hardware
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Whole-program optimization for time and space efficient threads
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The cilk system for parallel multithreaded computing
The cilk system for parallel multithreaded computing
Lazy threads: implementing a fast parallel call
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Efficient detection of determinacy races in Cilk programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Detecting data races in Cilk programs that use locks
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Solution of a problem in concurrent programming control
Communications of the ACM
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Parallel Symbolic Computing in Cid
PSLS '95 Proceedings of the International Workshop on Parallel Symbolic Languages and Systems
Garbage Collection is Fast, but a Stack is Faster
Garbage Collection is Fast, but a Stack is Faster
The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem
The Function of FUNCTION in LISP, or Why the FUNARG Problem Should be Called the Environment Problem
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Detecting data races in Cilk programs that use locks
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Efficient large-scale process-oriented parallel simulations
Proceedings of the 30th conference on Winter simulation
StackThreads/MP: integrating futures into calling standards
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Pointer analysis for multithreaded programs
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
PADS '00 Proceedings of the fourteenth workshop on Parallel and distributed simulation
Proceedings of the ACM 2000 conference on Java Grande
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Pthreads for dynamic and irregular parallelism
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel randomized best-first minimax search
Artificial Intelligence
Pointer analysis for structured parallel programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Suboptimal Minimum Cluster Volume Cover-Based Method for Measuring Fractal Dimension
IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluating the XMT Parallel Programming Model
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Next Generation System Software for Future High-End Computing Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Production Job Scheduling for Parallel Shared Memory Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Online Computation of Critical Paths for Multithreaded Languages
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Recursion Unrolling for Divide and Conquer Programs
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Fusion of Concurrent Invocations of Exclusive Methods
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
A Case Study of Load Distribution in Parallel View Frustum Culling and Collision Detection
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Parallel Computation: MM +/- X
Informatics - 10 Years Back. 10 Years Ahead.
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic evaluation: an adaptive evaluation strategy for non-strict programs
ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
SilkRoad II: mixed paradigm cluster computing with RC_dag consistency
Parallel Computing
A comparative analysis of fine-grain threads packages
Journal of Parallel and Distributed Computing
Run-Time Support for the Automatic Parallelization of Java Programs
The Journal of Supercomputing
A fast Fourier transform compiler
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Parallel and fully recursive multifrontal sparse Cholesky
Future Generation Computer Systems - Special issue: Selected numerical algorithms
On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Transparent proxies for java futures
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Adding parallelism to visual data flow programs
SoftVis '05 Proceedings of the 2005 ACM symposium on Software visualization
Symbolic bounds analysis of pointers, array indices, and accessed memory regions
ACM Transactions on Programming Languages and Systems (TOPLAS)
A generic approach to parallel chart parsing with an application to LinGO
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
SmartApps: middle-ware for adaptive applications on reconfigurable platforms
ACM SIGOPS Operating Systems Review
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Programming with exceptions in JCilk
Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling DAGs on asynchronous processors
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
Iterative context bounding for systematic testing of multithreaded programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Multithreaded programming in Cilk
Proceedings of the 2007 international workshop on Parallel symbolic computation
Adaptive loops with kaapi on multicore and grid: applications in symmetric cryptography
Proceedings of the 2007 international workshop on Parallel symbolic computation
Proceedings of the 2007 international workshop on Parallel symbolic computation
A formal model of a system for automated program parallelization
Programming and Computing Software
Supporting exception handling for futures in Java
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Parallel unsymmetric-pattern multifrontal sparse LU with column preordering
ACM Transactions on Mathematical Software (TOMS)
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Measuring and Evaluating Parallel State-Space Exploration Algorithms
Electronic Notes in Theoretical Computer Science (ENTCS)
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
ACM SIGARCH Computer Architecture News
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
An adaptive cut-off for task parallelism
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Fine Grain Distributed Implementation of a Dataflow Language with Provable Performances
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
A Proposal for Task Parallelism in OpenMP
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
An Efficient OpenMP Runtime System for Hierarchical Architectures
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Deque-Free Work-Optimal Parallel STL Algorithms
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
An Experimental Evaluation of the New OpenMP Tasking Model
Languages and Compilers for Parallel Computing
Proceedings of the 4th workshop on Declarative aspects of multicore programming
gluepy: A Simple Distributed Python Programming Framework for Complex Grid Environments
Languages and Compilers for Parallel Computing
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Backtracking-based load balancing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Serialization sets: a dynamic dependence-based parallel execution model
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural support for cilk computations on many-core architectures
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CellSs: Scheduling techniques to better exploit memory hierarchy
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Kendo: efficient deterministic multithreading in software
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
As-if-serial exception handling semantics for Java futures
Science of Computer Programming
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM SIGARCH Computer Architecture News
Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
Reducers and other Cilk++ hyperobjects
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Brief announcement: a lower bound for depth-restricted work stealing
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Experience with SC: transformation-based implementation of various extensions to C
Proceedings of the 2007 International Lisp Conference
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
The design of a task parallel library
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Autotuning multigrid with PetaBricks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
PFunc: modern task parallelism for modern high performance computing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Helper locks for fork-join parallel programming
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Performance Evaluation of Work Stealing for Streaming Applications
OPODIS '09 Proceedings of the 13th International Conference on Principles of Distributed Systems
Race-free and memory-safe multithreading: design and implementation in cyclone
Proceedings of the 5th ACM SIGPLAN workshop on Types in language design and implementation
Lightweight asynchrony using parasitic threads
Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Satin: A high-level and efficient grid programming model
ACM Transactions on Programming Languages and Systems (TOPLAS)
A randomized scheduler with probabilistic guarantees of finding bugs
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Cilk++ concurrency platform
The Journal of Supercomputing
Computer Languages, Systems and Structures
UTS: an unbalanced tree search benchmark
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
STAPL: an adaptive, generic parallel C++ library
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Parallelising symbolic state-space generators
CAV'07 Proceedings of the 19th international conference on Computer aided verification
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Memory models: a case for rethinking parallel languages and hardware
Communications of the ACM
Evaluation of OpenMP task scheduling strategies
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Extending the OpenMP tasking model to allow dependent tasks
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Scheduling dynamic OpenMP applications over multicore architectures
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Bamboo: a data-centric, object-oriented approach to many-core software
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Brief announcement: serial-parallel reciprocity in dynamic multithreaded languages
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Simplifying concurrent algorithms by exploiting hardware transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Online mapping of MPI-2 dynamic tasks to processes and threads
International Journal of High Performance Systems Architecture
A mean field model of work stealing in large-scale systems
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
STAPL: standard template adaptive parallel library
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
Balanced dense polynomial multiplication on multi-cores
ACM Communications in Computer Algebra
Parallel computation of the minimal elements of a poset
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Hardware/software support for adaptive work-stealing in on-chip multiprocessor
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Using memory mapping to support cactus stacks in work-stealing runtime systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Gossamer: a lightweight programming framework for multicore machines
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Concurrent programming with revisions and isolation types
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Concurrency by modularity: design patterns, a case in point
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Proceedings of the Conference on Design, Automation and Test in Europe
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Building scalable software systems in the multicore era
Proceedings of the FSE/SDP workshop on Future of software engineering research
Multi-GPU and multi-CPU parallelization for interactive physics simulations
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Hierarchical multithreading: programming model and system software
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Comparing the usability of library vs. language approaches to task parallelism
Evaluation and Usability of Programming Languages and Tools
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A stream-computing extension to OpenMP
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Three layer cake for shared-memory programming
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Space profiling for parallel functional programs
Journal of Functional Programming
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Semantics of concurrent revisions
ESOP'11/ETAPS'11 Proceedings of the 20th European conference on Programming languages and systems: part of the joint European conferences on theory and practice of software
Scheduling task parallelism on multi-socket multicore systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Unbalanced tree search on a manycore system using the GPI programming model
Computer Science - Research and Development
Experiments with the Fresh Breeze tree-based memory model
Computer Science - Research and Development
Pervasive parallelism for managed runtimes
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Proceedings of the 8th ACM International Conference on Computing Frontiers
A runtime implementation of OpenMP tasks
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Work stealing for multi-core HPC clusters
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Hardware and software tradeoffs for task synchronization on manycore architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Combining RTSJ with Fork/Join: a priority-based model
Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems
Safe parallel programming using dynamic dependence hints
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Oracle scheduling: controlling granularity in implicitly parallel languages
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
AC: composable asynchronous IO for native languages
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
A parallel programming model for ada
SIGAda '11 Proceedings of the 2011 ACM annual international conference on Special interest group on the ada programming language
Periodic hierarchical load balancing for large supercomputers
International Journal of High Performance Computing Applications
Implementation of a hierarchical N-body simulator using the Ompss programming model
Proceedings of the first workshop on Irregular applications: architectures and algorithm
Habanero-Java: the new adventures of old X10
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Factory: an object-oriented parallel programming substrate for deep multiprocessors
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Adaptive encoding of multimedia streams on MPSoC
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
FFT-based dense polynomial arithmetic on multi-cores
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Proceedings of the International Conference on Computer-Aided Design
On-line adaptive parallel prefix computation
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Massively parallel breadth first search using a tree-structured memory model
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Shared work list: hacking amorphous data parallelism in UPC
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Performance of parallel bit-reversal with cilk and UPC for fast fourier transform
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Deterministic parallel random-number generation for dynamic-multithreading platforms
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A work-stealing scheduler for X10's task parallelism with suspension
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Aikido: accelerating shared data dynamic analyses
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Dataflow execution of sequential imperative programs on multicore architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
An efficient and flexible task management for many cores
Transactions on High-Performance Embedded Architectures and Compilers IV
Support for OpenMP tasks on cell architecture
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Lightweight lexical closures for legitimate execution stack access
CC'06 Proceedings of the 15th international conference on Compiler Construction
Extendable pattern-oriented optimization directives
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A performance model for X10 applications: what's going on under the hood?
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
DAG3: a tool for design and analysis of applications for multicore architectures
Proceedings of the 27th Annual ACM Symposium on Applied Computing
OpenMP task scheduling strategies for multicore NUMA systems
International Journal of High Performance Computing Applications
Dynamic synthesis for relaxed memory models
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Scalable and precise dynamic datarace detection for structured parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces
Proceedings of the 2012 international symposium on Memory Management
Data-driven fault tolerance for work stealing computations
Proceedings of the 26th ACM international conference on Supercomputing
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
Proceedings of the 26th ACM international conference on Supercomputing
WSCOM: Online Task Scheduling with Data Transfers
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Design, verification and applications of a new read-write lock algorithm
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Memory-mapping support for reducer hyperobjects
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
The communication complexity of distributed task allocation
PODC '12 Proceedings of the 2012 ACM symposium on Principles of distributed computing
For extreme parallelism, your OS is Sooooo last-millennium
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Processor allocation for optimistic parallelization of irregular programs
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Extendable pattern-oriented optimization directives
ACM Transactions on Architecture and Code Optimization (TACO)
Haskell vs. f# vs. scala: a high-level language features and parallelism support comparison
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Server-based scheduling of parallel real-time tasks
Proceedings of the tenth ACM international conference on Embedded software
Work-stealing without the baggage
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Software data-triggered threads
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Interactive physical simulation on multicore architectures
EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Fast asymmetric thread synchronization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Compiler support for lightweight context switching
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Transactional access to shared memory in starss, a task based programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A new programming paradigm for GPGPU
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Programming support and scheduling for communicating parallel tasks
Journal of Parallel and Distributed Computing
Efficient data race detection for async-finish parallelism
Formal Methods in System Design
Variable permissions for concurrency verification
ICFEM'12 Proceedings of the 14th international conference on Formal Engineering Methods: formal methods and software engineering
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Exploring heterogeneous scheduling using the task-centric programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Checking and enforcing robustness against TSO
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Computational sprinting on a hardware/software testbed
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Portable performance on heterogeneous architectures
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A proper performance evaluation system that summarizes code placement effects
Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Scalanytics: a declarative multi-core platform for scalable composable traffic analytics
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Holistic run-time parallelism management for time and energy efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hybrid parallel task placement in X10
Proceedings of the third ACM SIGPLAN X10 Workshop
A divide and conquer approach and a work-optimal parallel algorithm for the LIS problem
Information Processing Letters
A work-stealing scheduling framework supporting fault tolerance
Proceedings of the Conference on Design, Automation and Test in Europe
ARTM: a lightweight fork-join framework for many-core embedded systems
Proceedings of the Conference on Design, Automation and Test in Europe
WeeFence: toward making fences free in TSO
Proceedings of the 40th Annual International Symposium on Computer Architecture
Interference resilient PDES on multi-core systems: towards proportional slowdown
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Proceedings of the High Performance Computing Symposium
LVars: lattice-based data structures for deterministic parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Online feedback-directed optimizations for parallel Java code
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
PICCO: a general-purpose compiler for private distributed computation
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Real-time programming on accelerator many-core processors
Proceedings of the 2013 ACM SIGAda annual conference on High integrity language technology
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
Efficient multiprogramming for multicores with SCAF
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient work-stealing language runtimes
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Fence-free work stealing on bounded TSO processors
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ACM SIGAda Ada Letters
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Concurrency testing using schedule bounding: an empirical study
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Analysis of dependence tracking algorithms for task dataflow execution
ACM Transactions on Architecture and Code Optimization (TACO)
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Static safety guarantees for a low-level multithreaded language with regions
Science of Computer Programming
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Combined scheduling and mapping for scalable computing with parallel tasks
Scientific Programming - Biological Knowledge Discovery and Data Mining
Hi-index | 0.02 |
The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of the new implementation was aided by a clear strategy that arose from a theoretical analysis of the scheduling algorithm: concentrate on minimizing overheads that contribute to the work, even at the expense of overheads that contribute to the critical path. Although it may seem counterintuitive to move overheads onto the critical path, this "work-first" principle has led to a portable Cilk-5 implementation in which the typical cost of spawning a parallel thread is only between 2 and 6 times the cost of a C function call on a variety of contemporary machines. Many Cilk programs run on one processor with virtually no degradation compared to equivalent C programs. This paper describes how the work-first principle was exploited in the design of Cilk-5's compiler and its runtime system. In particular, we present Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler.