Shark: scaling file servers via cooperative caching

Authors:
Siddhartha Annapureddy;Michael J. Freedman;David Mazières
Affiliations:
New York University;New York University;New York University
Venue:
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Year:
2005

Citing 16
Cited 42

Scale and performance in a distributed file system

ACM Transactions on Computer Systems (TOCS)
Replication in the harp file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A coherent distributed file cache with directory write-behind

ACM Transactions on Computer Systems (TOCS)
Serverless network file systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
A worldwide flock of Condors: load sharing among workstation clusters

Future Generation Computer Systems - Special issue: resource management in distributed systems
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
The Order of Encryption and Authentication for Protecting Communications (or: How Secure Is SSL?)

CRYPTO '01 Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology
A Toolkit for User-Level File Systems

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Farsite: federated, available, and reliable storage for an incompletely trusted environment

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Ivy: a read/write peer-to-peer file system

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
An integrated experimental environment for distributed systems and networks

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Awarded Best Student Paper! - Pond: The OceanStore Prototype

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Democratizing content publication with coral

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Proactive recovery in a Byzantine-fault-tolerant system

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Arla: a free AFS client

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Tapestry: a resilient global-scale overlay for service deployment

IEEE Journal on Selected Areas in Communications

FreeLoader: Scavenging Desktop Storage Resources for Scientific Data

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Constructing collaborative desktop storage caches for large scientific datasets

ACM Transactions on Storage (TOS)
(Re)design considerations for scalable large-file content distribution

WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
A distributed file system for a wide-area high performance computing infrastructure

WORLDS'06 Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems - Volume 3
Experiences building PlanetLab

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Stork: package management for distributed VM environments

LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Supporting practical content-addressable caching with CZIP compression

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
A result-data offloading service for HPC centers

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Timely offloading of result-data in HPC centers

Proceedings of the 22nd annual international conference on Supercomputing
Adaptive file transfers for diverse environments

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Evaluating distributed systems: does background traffic matter?

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Low-bandwidth VM migration via opportunistic replay

Proceedings of the 9th workshop on Mobile computing systems and applications
Secure data deduplication

Proceedings of the 4th ACM international workshop on Storage security and survivability
The effectiveness of deduplication on virtual machine disk images

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Optimal node-selection algorithm for parallel download in overlay content-distribution networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
Flexible, wide-area storage for distributed systems with WheelFS

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Antfarm: efficient content distribution with managed swarms

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Fabric: a platform for secure distributed computation and storage

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Efficient locally trackable deduplication in replicated systems

Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Echo: A peer-to-peer clustering framework for improving communication in DHTs

Journal of Parallel and Distributed Computing
Efficient locally trackable deduplication in replicated systems

Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Efficient similarity estimation for systems exploiting data redundancy

INFOCOM'10 Proceedings of the 29th conference on Information communications
GatorShare: a file system framework for high-throughput data management

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Bimodal content defined chunking for backup streams

FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Experiences with CoralCDN: a five-year operational view

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
EndRE: an end-system redundancy elimination service for enterprises

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Wide-area network acceleration for the developing world

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
High performance, low complexity cooperative caching for wireless sensor networks

Wireless Networks
Exploiting similarity for multi-source downloads using file handprints

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Anchor-driven subchunk deduplication

Proceedings of the 4th Annual International Conference on Systems and Storage
Utilization-aware redirection policy in CDN: a case for energy conservation

ICT-GLOW'11 Proceedings of the First international conference on Information and communication on technology for the fight against global warming
Towards understanding modern web traffic

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Minimizing metadata access latency in wide area networked file systems

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

Information Processing Letters
Integrating caching techniques on a content distribution network

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Replicating for performance: case studies

Replication
Evaluating energy consumption in CDN servers

ICT-GLOW'12 Proceedings of the Second international conference on ICT as Key Technology against Global Warming
VMTorrent: scalable P2P virtual machine streaming

Proceedings of the 8th international conference on Emerging networking experiments and technologies
SAFE: A Source Deduplication Framework for Efficient Cloud Backup Services

Journal of Signal Processing Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Replication, history, and grafting in the Ori file system

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Network-aware data caching and prefetching for cloud-hosted metadata retrieval

NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Network file systems offer a powerful, transparent interface for accessing remote data. Unfortunately, in current network file systems like NFS, clients fetch data from a central file server, inherently limiting the system's ability to scale to many clients. While recent distributed (peer-to-peer) systems have managed to eliminate this scalability bottleneck, they are often exceedingly complex and provide non-standard models for administration and accountability. We present Shark, a novel system that retains the best of both worlds--the scalability of distributed systems with the simplicity of central servers. Shark is a distributed file system designed for large-scale, wide-area deployment, while also providing a drop-in replacement for local-area file systems. Shark introduces a novel cooperative-caching mechanism, in which mutually-distrustful clients can exploit each others' file caches to reduce load on an origin file server. Using a distributed index, Shark clients find nearby copies of data, even when files originate from different servers. Performance results show that Shark can greatly reduce server load and improve client latency for read-heavy workloads both in the wide and local areas, while still remaining competitive for single clients in the local area. Thus, Shark enables modestly-provisioned file servers to scale to hundreds of read-mostly clients while retaining traditional usability, consistency, security, and accountability.