Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Dummynet: a simple approach to the evaluation of network protocols
ACM SIGCOMM Computer Communication Review
Bochs: A Portable PC Emulator for Unix/X
Linux Journal
Genesis: a system for large-scale parallel network simulation
Proceedings of the sixteenth workshop on Parallel and distributed simulation
Performance and scalability of EJB applications
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Congestion control for high bandwidth-delay product networks
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
The Georgia Tech Network Simulator
MoMeTools '03 Proceedings of the ACM SIGCOMM workshop on Models, methods and tools for reproducible network research
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A solver for the network testbed mapping problem
ACM SIGCOMM Computer Communication Review
Memory resource management in VMware ESX server
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Resource overbooking and application profiling in shared hosting platforms
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An integrated experimental environment for distributed systems and networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Emergent (mis)behavior vs. complex software systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Monkey see, monkey do: a tool for TCP tracing and replaying
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
TCP offload is a dumb idea whose time has come
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Parallax: managing storage for a million machines
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Glacier: highly durable, decentralized storage despite massive correlated failures
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Quorum: flexible quality of service for internet services
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Model-based resource provisioning in a web service utility
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Experiences building PlanetLab
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Evaluating distributed systems: does background traffic matter?
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Difference engine: harnessing memory redundancy in virtual machines
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
High-fidelity switch models for software-defined network emulation
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking
Challenges in the emulation of large scale software defined networks
Proceedings of the 4th Asia-Pacific Workshop on Systems
Hi-index | 0.00 |
Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior at such scales. Testing large services should ideally be done at the same scale and configuration as the target deployment, which can be technically and economically infeasible. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines across a much smaller number of physical machines in a test harness. We show how to accurately scale CPU, network, and disk to provide the illusion that each VM matches a machine in the original service in terms of both available computing resources and communication behavior. We present the architecture and evaluation of a system we built to support such experimentation and discuss its limitations. We show that for a variety of services---including a commercial high-performance cluster-based file system---and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources.