MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance and scalability of EJB applications
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A solver for the network testbed mapping problem
ACM SIGCOMM Computer Communication Review
Memory resource management in VMware ESX server
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Resource overbooking and application profiling in shared hosting platforms
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An integrated experimental environment for distributed systems and networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
SHRiNK: a method for enabling scaleable performance prediction and efficient network simulation
IEEE/ACM Transactions on Networking (TON)
Emergent (mis)behavior vs. complex software systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Monkey see, monkey do: a tool for TCP tracing and replaying
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Parallax: managing storage for a million machines
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Glacier: highly durable, decentralized storage despite massive correlated failures
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Quorum: flexible quality of service for internet services
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Understanding and dealing with operator mistakes in internet services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Model-based resource provisioning in a web service utility
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
To infinity and beyond: time-warped network emulation
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Dummynet and forward error correction
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Large-scale virtualization in the Emulab network testbed
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Synchronized network emulation: matching prototypes with complex simulations
ACM SIGMETRICS Performance Evaluation Review
Transparent checkpoints of closed distributed systems in Emulab
Proceedings of the 4th ACM European conference on Computer systems
Building an automated and self-configurable emulation testbed for grid applications
Software—Practice & Experience
The Heisenberg measuring uncertainty in lightweight virtualization testbeds
CSET'09 Proceedings of the 2nd conference on Cyber security experimentation and test
JustRunIt: experiment-based management of virtualized data centers
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Debugging large scale applications in a virtualized environment
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
SliceTime: a platform for scalable and accurate network emulation
Proceedings of the 8th USENIX conference on Networked systems design and implementation
VM-based slack emulation of large-scale systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
OFRewind: enabling record and replay troubleshooting for networks
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
A Virtual Time System for OpenVZ-Based Network Emulations
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
Efficiently Scheduling Multi-Core Guest Virtual Machines on Multi-Core Hosts in Network Simulation
PADS '11 Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation
A new fast algorithm for connecting the INET simulation framework to applications in real-time
Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
ShadowStream: performance evaluation as a capability in production internet live streaming networks
Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
Virtual Time Integration of Emulation and Parallel Simulation
PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
ShadowStream: performance evaluation as a capability in production internet live streaming networks
ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
TimeSync: enabling scalable, high-fidelity hybrid network emulation
Proceedings of the 15th ACM international conference on Modeling, analysis and simulation of wireless and mobile systems
Reproducible network experiments using container-based emulation
Proceedings of the 8th international conference on Emerging networking experiments and technologies
Validation of application behavior on a virtual time integrated network emulation testbed
Proceedings of the Winter Simulation Conference
Proceedings of the Winter Simulation Conference
Flow-based partitioning of network testbed experiments
Computer Networks: The International Journal of Computer and Telecommunications Networking
Exalt: empowering researchers to evaluate large-scale storage systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Large-scale network services can consist of tens of thousands of machines running thousands of unique software configurations spread across hundreds of physical networks. Testing such services for complex performance problems and configuration errors remains a difficult problem. Existing testing techniques, such as simulation or running smaller instances of a service, have limitations in predicting overall service behavior. Although technically and economically infeasible at this time, testing should ideally be performed at the same scale and with the same configuration as the deployed service. We present DieCast, an approach to scaling network services in which we multiplex all of the nodes in a given service configuration as virtual machines (VM) spread across a much smaller number of physical machines in a test harness. CPU, network, and disk are then accurately scaled to provide the illusion that each VM matches a machine from the original service in terms of both available computing resources and communication behavior to remote service nodes. We present the architecture and evaluation of a system to support such experimentation and discuss its limitations. We show that for a variety of services--including a commercial, high-performance, cluster-based file system--and resource utilization levels, DieCast matches the behavior of the original service while using a fraction of the physical resources.