Practical and low-overhead masking of failures of TCP-based servers

  • Authors:
  • Dmitrii Zagorodnov;Keith Marzullo;Lorenzo Alvisi;Thomas C. Bressoud

  • Affiliations:
  • University of California, Santa Barbara, Santa Barbara, CA;University of California, San Diego, La Jolla, CA;The University of Texas at Austin, Austin, TX;Denison University, Granville, OH

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. Our approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the software running on the clients. Furthermore, it runs on commodity hardware. We compare two implementations of this architecture (one based on primary/backup replication and another based on message logging) focusing on scalability, failover time, and application transparency. We evaluate three types of services: a file server, a Web server, and a multimedia streaming server. Our experiments suggest that the approach incurs low overhead on throughput, scales well as the number of clients increases, and allows recovery of the service in near-optimal time.