OceanStore: an architecture for global-scale persistent storage
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Neko: A Single Environment to Simulate and Prototype Distributed Algorithms
ICOIN '01 Proceedings of the The 15th International Conference on Information Networking
PAST: A Large-Scale, Persistent Peer-to-Peer Storage Utility
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Capriccio: scalable threads for internet services
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Ivy: a read/write peer-to-peer file system
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Simulating Large-Scale P2P Systems with the WiDS Toolkit
MASCOTS '05 Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
MACEDON: methodology for automatically creating, evaluating, and designing overlay networks
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using queries for distributed monitoring and forensics
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
BitVault: a highly reliable distributed data retention platform
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Opis: reliable distributed systems in OCaml
Proceedings of the 4th international workshop on Types in language design and implementation
DSF: a common platform for distributed systems research and development
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
DSF: a common platform for distributed systems research and development
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
InContext: simple parallelism for distributed applications
Proceedings of the 20th international symposium on High performance distributed computing
Hi-index | 0.00 |
Faced with a proliferation of distributed systems in research and production groups, we have devised the WiDS ecosystem of technologies to optimize the development and testing process for such systems. WiDS optimizes the process of developing an algorithm, testing its correctness in a debuggable environment, and testing its behavior at large scales in a distributed simulation. We have developed many distributed protocols and systems using WiDS, including a large-scale backup service that is robust enough to be deployed. We have also used WiDS to perform ultra-large scale (1million instances) simulation of a production protocol. In this paper, we describe the principles and design of WiDS, share the lessons that we learned, and discuss on-going research that will further reduce programming and debugging difficulties of distributed systems.