Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services

Authors:
Kiran Nagaraja;Neeraj Krishnan;Ricardo Bianchini;Richard P. Martin;Thu D. Nguyen
Affiliations:
-;-;-;-;-
Venue:
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Year:
2003

Citing 24
Cited 4

Congestion avoidance and control

SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
XTP: the Xpress Transfer Protocol

XTP: the Xpress Transfer Protocol
TCP Vegas: new techniques for congestion detection and avoidance

SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Improving TCP/IP performance over wireless networks

MobiCom '95 Proceedings of the 1st annual international conference on Mobile computing and networking
Improving the start-up behavior of a congestion control scheme for TCP

Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Locality-aware request distribution in cluster-based network servers

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
End-to-end arguments in system design

ACM Transactions on Computer Systems (TOCS)
Efficiency vs. portability in cluster-based network servers

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Improving cluster availability using workstation validation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Analytical and experimental evaluation of cluster-based network servers

World Wide Web
Lessons from Giant-Scale Services

IEEE Internet Computing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
TNet: A Reliable System Area Network

IEEE Micro
Comparing Operating Systems Using Robustness Benchmarks

SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
A Software Multilevel Fault Injection Mechanism: Case Study Evaluating the Virtual Interface Architecture

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Failure Data Analysis of a LAN of Windows NT Based Computers

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Modeling and Analysis of Software Aging and Rejuvenation

SS '00 Proceedings of the 33rd Annual Simulation Symposium
A Methodology for Detection and Estimation of Software Aging

ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Measurement of Failure Rate in Widely Distributed Software

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Quantifying the Impact of Architectural Scaling on Communication

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
User-Level Communication in Cluster-Based Servers

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Reducing the Cost of System Administration of a Disk Storage System

Reducing the Cost of System Administration of a Disk Storage System
Towards availability benchmarks: a case study of software raid systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Scalable content-aware request distribution in cluster-based networks servers

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference

PRESS: A Clustered Server Based on User-Level Communication

IEEE Transactions on Parallel and Distributed Systems
Quantifying the Performability of Cluster-Based Services

IEEE Transactions on Parallel and Distributed Systems
SPEK: A Storage Performance Evaluation Kernel Module for Block-Level Storage Systems under Faulty Conditions

IEEE Transactions on Dependable and Secure Computing
Using fault injection and modeling to evaluate the performability of cluster-based services

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the impact of different communication architectures on the performability (performance + availability) of cluster-based servers.In particular, we use a combination offault-injection experiments and analytic modeling to evaluate the performability of two popular communication protocols, TCP and VIA, as the intra-cluster communication substrate of a sophisticated Web server. Our analysis leads to several interesting conclusions, the most surprising of which is, under the same fault load, VIA-based servers deliver greater availability than TCP-based servers. If we assume higher fault rates for VIA-based servers because the underlying technology is more immature and programming model more complex, we find that packet errors or application faults would have to occur at approximately 4 times the rate inTCP-based servers before their performabilities equalize. We use our results from the study to suggest that high-performance and robust communication layers for highly available cluster-based servers should preserve message boundaries, as opposed to using byte streams, use single-copy transfers, pre-allocate channel resources, and report errors in manner consistent with the network fabric's fault model.