ACM Transactions on Computer Systems (TOCS)
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Introduction to the Theory of Computation
Introduction to the Theory of Computation
DieHard: probabilistic memory safety for unsafe languages
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
In VINI veritas: realistic and controlled network experimentation
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
How to lease the internet in your spare time
ACM SIGCOMM Computer Communication Review
Detecting BGP configuration faults with static analysis
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Designing extensible IP router software
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Network-wide prediction of BGP routes
IEEE/ACM Transactions on Networking (TON)
Delta execution for software reliability
HotDep'07 Proceedings of the 3rd workshop on on Hot Topics in System Dependability
Characterizing network events and their impact on routing
CoNEXT '07 Proceedings of the 2007 ACM CoNEXT conference
Shadow configuration as a network management primitive
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Efficient IP-address lookup with a shared forwarding table for multiple virtual routers
CoNEXT '08 Proceedings of the 2008 ACM CoNEXT Conference
Measurement methods for fast and accurate blackhole identification with binary tomography
Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Virtually eliminating router bugs
Proceedings of the 5th international conference on Emerging networking experiments and technologies
Hi-index | 0.00 |
Implementation bugs are a highly critical problem in wide-area networks. The software running on core routers is subject to vulnerabilities, coding mistakes, and misconfiguration. Unfortunately, these problems are often found after deployment in live networks, where they lead to outages, make networks prone to attack, and involve a challenging process to localize and debug. In this work, we propose a bug-tolerant router that runs multiple diverse copies of router software in parallel, such that each copy is unlikely to fail at the same time as the others. Diversity is achieved by varying the ordering and timing of routing messages, running different routing protocols, running code written by different implementers, etc. Because each copy is different, each copy will likely have a different output during an error, and hence a simple voting procedure is then used to decide which copy's output will "drive" packet forwarding and control-plane communication with other routers. In this paper we motivate our design, describe some design decisions and tradeoffs, and then conclude with a description of our ongoing work in building a prototype of this architecture.