Archipelago: an Island-based file system for highly available and scalable internet services

Authors:
Minwen Ji;Edward W. Felten;Randolph Wang;Jaswinder Pal Singh
Affiliations:
Department of Computer Science, Princeton University;Department of Computer Science, Princeton University;Department of Computer Science, Princeton University;Department of Computer Science, Princeton University
Venue:
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Year:
2000

Citing 15
Cited 10

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Reimplementing the Cedar file system using logging and group commit

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Coda: A Highly Available File System for a Distributed Workstation Environment

IEEE Transactions on Computers
Hive: fault containment for shared-memory multiprocessors

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Serverless network file systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Cluster-based scalable network services

Proceedings of the sixteenth ACM symposium on Operating systems principles
Frangipani: a scalable distributed file system

Proceedings of the sixteenth ACM symposium on Operating systems principles
Locality-aware request distribution in cluster-based network servers

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The design of a multicast-based distributed file system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Inside the Windows NT File System

Inside the Windows NT File System
Harvest, Yield, and Scalable Tolerant Systems

HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Characteristics of File System Workloads

Characteristics of File System Workloads

Taming aggressive replication in the Pangaea wide-area file system

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Taming aggressive replication in the Pangaea wide-area file system

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Improving storage system availability with D-GRAID

ACM Transactions on Storage (TOS)
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Chunkfs: using divide-and-conquer to improve file system reliability and repair

HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Graceful degradation via versions: specifications and implementations

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Glamor: an architecture for file system federation

IBM Journal of Research and Development
Kinesis: A new approach to replica placement in distributed storage systems

ACM Transactions on Storage (TOS)
Improving storage system availability with D-GRAID

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Chunkfs: using divide-and-conquer to improve file system reliability and repair

HotDep'06 Proceedings of the Second conference on Hot topics in system dependability

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maintaining availability in the face of failures is a critical requirement for Internet services. Existing approaches in cluster-based data storage rely on redundancy to survive a small number of failures, but the system becomes entirely unavailable if more failures occur. We describe an approach that allows a cluster file server to isolate failures so that the system can continue to serve most clients. Our approach is complementary to existing redundancy-based methods: redundancy can mask the first few failures, and failure isolation can take over and maintain availability for the majority of clients if more failures occur. The building blocks of our design are self-contained and load-balanced file servers called islands. The main idea underlying island-based design is the one-island principle: as many operations as possible should involve exactly one island. The one-island principle provides failure isolation because each island can function independently of other islands' failures. It also helps the file system scale with the system and workload sizes because communication and synchronization across islands are reduced. We implemented a prototype island-based file system called Archipelago on a cluster of PCs running Windows NT 4.0 connected by Ethernet. The measurement of micro benchmark shows that Archipelago adds little overhead to NTFS and Win32 RPC performance; while the measurement of operation mixes based on NTFS traces shows a speedup of 15.7 on 16 islands