Pico replication: a high availability framework for middleboxes

Authors:
Shriram Rajagopalan;Dan Williams;Hani Jamjoom
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY and University of British Columbia, Vancouver, Canada;IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 4th annual Symposium on Cloud Computing
Year:
2013

Citing 24
Cited 0

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Hypervisor-based fault tolerance

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Managing update conflicts in Bayou, a weakly connected replicated storage system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks

Theoretical Computer Science
Bro: a system for detecting network intruders in real-time

Computer Networks: The International Journal of Computer and Telecommunications Networking
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Weighted voting for replicated data

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
On the performance of middleboxes

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
ReVirt: enabling intrusion analysis through virtual-machine logging and replay

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Session state: beyond soft state

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Live migration of virtual machines

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Execution replay of multiprocessor virtual machines

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Remus: high availability via asynchronous virtual machine replication

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
FAWN: a fast array of wimpy nodes

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A gossip-style failure detection service

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Respec: efficient online multiprocessor replayvia speculation and external determinism

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Web timeouts and their implications

PAM'10 Proceedings of the 11th international conference on Passive and active measurement
SecondSite: disaster tolerance as a service

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Design and implementation of a consolidated middlebox architecture

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Netmap: a novel framework for fast packet I/O

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
xOMB: extensible open middleboxes with commodity servers

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Split/merge: system support for elastic execution in virtual middleboxes

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Middleboxes are being rearchitected to be service oriented, composable, extensible, and elastic. Yet system-level support for high availability (HA) continues to introduce significant performance overhead. In this paper, we propose Pico Replication (PR), a system-level framework for middleboxes that exploits their flow-centric structure to achieve low overhead, fully customizable HA. Unlike generic (virtual machine level) techniques, PR operates at the flow level. Individual flows can be checkpointed at very high frequencies while the middlebox continues to process other flows. Furthermore, each flow can have its own checkpoint frequency, output buffer and target for backup, enabling rich and diverse policies that balance---per-flow---performance and utilization. PR leverages OpenFlow to provide near instant flow-level failure recovery, by dynamically rerouting a flow's packets to its replication target. We have implemented PR and a flow-based HA policy. In controlled experiments, PR sustains checkpoint frequencies of 1000Hz, an order of magnitude improvement over current VM replication solutions. As a result, PR drastically reduces the overhead on end-to-end latency from 280% to 15.5% and throughput overhead from 99.5% to 3.2%.