Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Design and evaluation of a conit-based continuous consistency model for replicated services
ACM Transactions on Computer Systems (TOCS)
The Journal of Machine Learning Research
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Loose synchronization for large-scale networked systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
The Impact of System Design Parameters on Application Noise Sensitivity
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Disks are like snowflakes: no two are alike
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Scalable inference in latent variable models
Proceedings of the fifth ACM international conference on Web search and data mining
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Naiad: a timely dataflow system
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Many important applications fall into the broad class of iterative convergent algorithms. Parallel implementations of these algorithms are naturally expressed using the Bulk Synchronous Parallel (BSP) model of computation. However, implementations using BSP are plagued by the straggler problem, where every transient slowdown of any given thread can delay all other threads. This paper presents the Stale Synchronous Parallel (SSP) model as a generalization of BSP that preserves many of its advantages, while avoiding the straggler problem. Algorithms using SSP can execute efficiently, even with significant delays in some threads, addressing the oft-faced straggler problem.