Using idle workstations in a shared computing environment
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Communications of the ACM
The Sprite Network Operating System
Computer
Communications of the ACM
The Amber system: parallel programming on a network of multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
PVM: a framework for parallel distributed computing
Concurrency: Practice and Experience
Experiences with the Amoeba distributed operating system
Communications of the ACM
UNIX network programming
Transparent process migration: design alternatives and the sprite implementation
Software—Practice & Experience
Compiling with continuations
Spawn: A Distributed Computational Economy
IEEE Transactions on Software Engineering
DAWGS—a distributed compute server utilizing idle workstations
Journal of Parallel and Distributed Computing
Supercomputing out of recycled garbage: preliminary experience with Piranha
ICS '92 Proceedings of the 6th international conference on Supercomputing
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Utopia: a load sharing facility for large, heterogeneous distributed computer systems
Software—Practice & Experience
Efficient parallel computing in distributed workstation environments
Parallel Computing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Hive: fault containment for shared-memory multiprocessors
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The cilk system for parallel multithreaded computing
The cilk system for parallel multithreaded computing
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
End-to-end arguments in system design
ACM Transactions on Computer Systems (TOCS)
A Case for NOW (Networks of Workstations)
IEEE Micro
Dag-Consistent Distributed Shared Memory
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The LOCUS distributed operating system
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
Message logging: pessimistic, optimistic, and causal
ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Transparent fault tolerance for parallel applications on networks of workstations
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Transparent adaptive parallelism on NOWs using OpenMP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Dividing the application definition from the execution
Computing in Science and Engineering
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Computational paradigms and protection
Proceedings of the 2001 workshop on New security paradigms
Trustless Grid Computing in ConCert
GRID '02 Proceedings of the Third International Workshop on Grid Computing
Satin: Efficient Parallel Divide-and-Conquer in Java
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Adaptive Parallelism for OpenMP Task Parallel Programs
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Fault-Tolerance, Malleability and Migration for Divide-and-Conquer Applications on the Grid
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Webcom-G: grid enabled metacomputing
Neural, Parallel & Scientific Computations - Special issue: Grid computing
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
ML grid programming with ConCert
Proceedings of the 2006 workshop on ML
Probabilistic accuracy bounds for fault-tolerant computations that discard tasks
Proceedings of the 20th annual international conference on Supercomputing
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Worldwide computing: Adaptive middleware and programming technology for dynamic Grid environments
Scientific Programming - Dynamic Grids and Worldwide Computing
CX: A scalable, robust network for parallel computing
Scientific Programming
Parallel processing with windows NT networks
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors
Proceedings of the 2007 international workshop on Parallel symbolic computation
The co-replication methodology and its application to structured parallel programs
Proceedings of the 2007 symposium on Component and framework technology in high-performance and scientific computing
WSPE: a peer-to-peer programming environment for grid-unaware applications
Proceedings of the 5th international workshop on Middleware for grid computing: held at the ACM/IFIP/USENIX 8th International Middleware Conference
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Distributed Scheduling of Parallel Hybrid Computations
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Satin: A high-level and efficient grid programming model
ACM Transactions on Programming Languages and Systems (TOPLAS)
Selective Recovery from Failures in a Task Parallel Programming Model
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Affinity driven distributed scheduling algorithm for parallel computations
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface
International Journal of High Performance Systems Architecture
Performance driven distributed scheduling of parallel hybrid computations
Theoretical Computer Science
Performance driven multi-objective distributed scheduling for parallel computations
ACM SIGOPS Operating Systems Review
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
A down-to-earth look at the cloud host OS
Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
Improving performance of adaptive component-based dataflow middleware
Parallel Computing
Consistent rollback protocols for autonomic ASSISTANT applications
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Survey: Survey of fault tolerant techniques for grid
Computer Science Review
Data-driven fault tolerance for work stealing computations
Proceedings of the 26th ACM international conference on Supercomputing
Persistent fault-tolerance for divide-and-conquer applications on the grid
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Dynamic distributed scheduling algorithm for state space search
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Hybrid parallel task placement in X10
Proceedings of the third ACM SIGPLAN X10 Workshop
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hi-index | 0.00 |
In this paper, we present the design of Cilk-NOW, a runtime system that adaptively and reliably executes functional Cilk programs in parallel on a network of UNIX workstations. Cilk (pronounced "silk") is a parallel multithreaded extension of the C language, and all Cilk runtime systems employ a provably efficient threadscheduling algorithm. Cilk-NOW is such a runtime system, and in addition, Cilk-NOW automatically delivers adaptive and reliable execution for a functional subset of Cilk programs. By adaptive execution, we mean that each Cilk program dynamically utilizes a changing set of otherwise-idle workstations. By reliable execution, we mean that the Cilk-NOW system as a whole and each executing Cilk program are able to tolerate machine and network faults. Cilk-NOW provides these features while programs remain fault oblivious, meaning that Cilk programmers need not code for fault tolerance. Throughout this paper, we focus on end-to-end design decisions, and we show how these decisions allow the design to exploit high-level algorithmic properties of the Cilk programming model in order to simplify and streamline the implementation.