HANet: a framework toward ultimately reliable network services

Authors:
Ying-Jie Jiang;Da-Wei Chang;Ruei-Chuan Chang
Affiliations:
Department of Computer and Information Science, National Chiao-Tung University, 1001 Ta Hsueh Road, HsinChu, Taiwan 30050, ROC;Department of Computer and Information Science, National Chiao-Tung University, 1001 Ta Hsueh Road, HsinChu, Taiwan 30050, ROC;Department of Computer and Information Science, National Chiao-Tung University, 1001 Ta Hsueh Road, HsinChu, Taiwan 30050, ROC
Venue:
Journal of Systems and Software
Year:
2005

Citing 16
Cited 0

Understanding fault-tolerant distributed systems

Communications of the ACM
High-Availability Computer Systems

Computer
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Realizing fault resilience in Web-server cluster

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
NCSA's World Wide Web Server: Design and Performance

Computer
The Vision of Autonomic Computing

Computer
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Reducing Recovery Time in a Small Recursively Restartable System

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Dynamic reconfiguration: Basic building blocks for autonomic computing on IBM pSeries servers

IBM Systems Journal
Enabling autonomic behavior in systems software with hot swapping

IBM Systems Journal
Engineering fault-tolerant tcp/ip services

Engineering fault-tolerant tcp/ip services
Undo for operators: building an undoable e-mail store

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Checking system rules using system-specific, programmer-written compiler extensions

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Fine-grained failover using connection migration

USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
Why do internet services fail, and what can be done about it?

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4

Quantified Score

Hi-index	0.00

Visualization

Abstract

High availability is becoming an essential part of network services because even a little downtime may lead to a great loss of money. According to previous research, network failure is one of the major causes of system unavailability. In this paper, we propose a framework called HANet for building highly available network services. The main goal of HANet is to allow a server to continue providing services when all its network interfaces to the outside world (i.e., public interfaces) have failed. This is achieved by two techniques. First, a network interface can be backed up not only by other public network interfaces, but also by other inter-server I/O communication interfaces (i.e., private interfaces) such as Ethernet, USB, RS232, etc. Therefore, IP packets can still be transmitted and received via these I/O links, even when all of the public network interfaces have failed. Second, HANet allows a server to take over the packet transmission job of another network-failed server. The benefit of HANet is that a network-failed server will not lose any requests which are being processed. And, it is efficient since no synchronization overhead or replaying process is required. Moreover, it is totally transparent to server applications and clients. To demonstrate the feasibility of HANet, we implemented it in the Linux kernel. According to the performance results, using a private Fast Ethernet interface for data communication leads to only 1% overhead in user-perceived latency when the public Fast Ethernet interface of the server has failed. This indicates that HANet is efficient, and hence is feasible for commercial network services.