Congestion avoidance and control
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
XTP: the Xpress Transfer Protocol
XTP: the Xpress Transfer Protocol
TCP Vegas: new techniques for congestion detection and avoidance
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Improving TCP/IP performance over wireless networks
MobiCom '95 Proceedings of the 1st annual international conference on Mobile computing and networking
Improving the start-up behavior of a congestion control scheme for TCP
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Locality-aware request distribution in cluster-based network servers
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
End-to-end arguments in system design
ACM Transactions on Computer Systems (TOCS)
Efficiency vs. portability in cluster-based network servers
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Improving cluster availability using workstation validation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Lessons from Giant-Scale Services
IEEE Internet Computing
TNet: A Reliable System Area Network
IEEE Micro
Comparing Operating Systems Using Robustness Benchmarks
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Failure Data Analysis of a LAN of Windows NT Based Computers
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Modeling and Analysis of Software Aging and Rejuvenation
SS '00 Proceedings of the 33rd Annual Simulation Symposium
A Methodology for Detection and Estimation of Software Aging
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Quantifying the Impact of Architectural Scaling on Communication
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
User-Level Communication in Cluster-Based Servers
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Reducing the Cost of System Administration of a Disk Storage System
Reducing the Cost of System Administration of a Disk Storage System
Towards availability benchmarks: a case study of software raid systems
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Scalable content-aware request distribution in cluster-based networks servers
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
PRESS: A Clustered Server Based on User-Level Communication
IEEE Transactions on Parallel and Distributed Systems
Quantifying the Performability of Cluster-Based Services
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Dependable and Secure Computing
Using fault injection and modeling to evaluate the performability of cluster-based services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Hi-index | 0.00 |
We consider the impact of different communication architectures on the performability (performance + availability) of cluster-based servers.In particular, we use a combination offault-injection experiments and analytic modeling to evaluate the performability of two popular communication protocols, TCP and VIA, as the intra-cluster communication substrate of a sophisticated Web server. Our analysis leads to several interesting conclusions, the most surprising of which is, under the same fault load, VIA-based servers deliver greater availability than TCP-based servers. If we assume higher fault rates for VIA-based servers because the underlying technology is more immature and programming model more complex, we find that packet errors or application faults would have to occur at approximately 4 times the rate inTCP-based servers before their performabilities equalize. We use our results from the study to suggest that high-performance and robust communication layers for highly available cluster-based servers should preserve message boundaries, as opposed to using byte streams, use single-copy transfers, pre-allocate channel resources, and report errors in manner consistent with the network fabric's fault model.