Robustness in Complex Systems

Authors:
Steven D. Gribble
Affiliations:
-
Venue:
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Year:
2001

Citing 0
Cited 20

Measuring the Robustness of a Resource Allocation

IEEE Transactions on Parallel and Distributed Systems
Improving storage system availability with D-GRAID

ACM Transactions on Storage (TOS)
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
An online evolutionary approach to developing internet services

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
A utility-centered approach to building dependable infrastructure services

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Emergent (mis)behavior vs. complex software systems

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Database-aware semantically-smart storage

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
A Robust Spanning Tree Topology for Data Collection and Dissemination in Distributed Environments

IEEE Transactions on Parallel and Distributed Systems
Graceful degradation via versions: specifications and implementations

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Ironmodel: robust performance models in the wild
Software maturity: design as dark art

ACM SIGSOFT Software Engineering Notes
A Multidisciplinary Framework For Resilence To Disasters And Disruptions

Journal of Integrated Design & Process Science
Avalanche Dynamics in Grids: Indications of SOC or HOT?

Proceedings of the 2005 conference on Self-Organization and Autonomic Informatics (I)
A case for on-machine load balancing

Journal of Parallel and Distributed Computing
Efficient middleware for byzantine fault tolerant database replication

Proceedings of the sixth conference on Computer systems
Improving storage system availability with D-GRAID

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
The robustness of resource allocations in parallel and distributed computing systems

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Automated diagnosis without predictability is a recipe for failure

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Evolutionary mechanics: new engineering principles for the emergence of flexibility in a dynamic and uncertain world

Natural Computing: an international journal
Failure recovery: when the cure is worse than the disease

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: This paper argues that a common design paradigm for systems is fundamentally flawed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this flawed paradigm, designers carefully attempt to predict the operating environment and failure modes of the system in order to design its basic operational mechanisms. However, as a system grows in complexity, the diffuse coupling between the components in the system inevitably leads to the butterfly effect, in which small perturbations can result in large changes in behavior. We explore this in the context of distributed data structures, a scalable, cluster-based storage server. We then consider a number of design techniques that help a system to be robust in the face of the unexpected, including overprovisioning, admission control, introspection, adaptivity through closed control loops. Ultimately, however, all complex systems eventually must contend with the unpredictable. Because of this, we believe systems should be designed to cope with failure gracefully.