The Vision of Autonomic Computing
Computer
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Stardust: tracking activity in a distributed storage system
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Emergent (mis)behavior vs. complex software systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Ursa minor: versatile cluster-based storage
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Argon: performance insulation for shared storage servers
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
DARC: dynamic analysis of root causes of latency distributions
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Reference-driven performance anomaly identification
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Black-box problem diagnosis in parallel file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Behavior-based problem localization for parallel file systems
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
mClock: handling throughput variability for hypervisor IO scheduling
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Scale and concurrency of GIGA+: file system directories with millions of files
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Dominant resource fairness: fair allocation of multiple resource types
Proceedings of the 8th USENIX conference on Networked systems design and implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Disks are like snowflakes: no two are alike
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Warehouse-Scale Computing: Entering the Teenage Decade
Proceedings of the 38th annual international symposium on Computer architecture
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Automated management is critical to the success of cloud computing, given its scale and complexity. But, most systems do not satisfy one of the key properties required for automation: predictability, which in turn relies upon low variance. Most automation tools are not effective when variance is consistently high. Using automated performance diagnosis as a concrete example, this position paper argues that for automation to become a reality, system builders must treat variance as an important metric and make conscious decisions about where to reduce it. To help with this task, we describe a framework for reasoning about sources of variance in distributed systems and describe an example tool for helping identify them.