An Efficient Data Location Protocol for Self.organizing Storage Clusters

Authors:
Hong Tang;Tao Yang
Affiliations:
University of California, Santa Barbara;University of California, Santa Barbara
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 38
Cited 2

Epidemic algorithms for replicated database maintenance

PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
The Sprite Network Operating System

Computer
Measurements of a distributed file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Disconnected operation in the Coda file system

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
LH: Linear Hashing for distributed files

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Parallel file systems for the IBM SP computers

IBM Systems Journal
The Zebra striped network file system

ACM Transactions on Computer Systems (TOCS)
Serverless network file systems

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
LH*—a scalable, distributed data structure

ACM Transactions on Database Systems (TODS)
Petal: distributed virtual disks

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Frangipani: a scalable distributed file system

Proceedings of the sixteenth ACM symposium on Operating systems principles
File system usage in Windows NT 4.0

Proceedings of the seventeenth ACM symposium on Operating systems principles
A distributed file service based on optimistic concurrency control

Proceedings of the tenth ACM symposium on Operating systems principles
LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
OceanStore: an architecture for global-scale persistent storage

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Managing energy and server resources in hosting centers

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Wide-area cooperative storage with CFS

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Distributed object location in a dynamic network

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Compact, adaptive placement schemes for non-uniform requirements

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Massive arrays of idle disks for storage archives

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Impala: a middleware system for managing autonomic, parallel sensor systems

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient Metadata Management in Large Distributed Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
A Fast Algorithm for Online Placement and Reorganization of Replicated Data

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architectures

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
The Swarm Scalable Storage System

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Secure routing for structured peer-to-peer overlay networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Awarded Best Student Paper! - Pond: The OceanStore Prototype

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
yFS: A Journaling File System Design for Handling Large Data Sets with Reduced Seeking

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Interposed request routing for scalable network storage

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Neptune: scalable replication management and programming support for cluster-based network services

USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
PVFS: a parallel file system for linux clusters

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4

A Self-Organizing Storage Cluster for Parallel Data-Intensive Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
RACE: A Robust Adaptive Caching Strategy for Buffer Cache

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Component additions and failures are common for large-scale storage clusters in production environments. To improve availability and manageability, we investigate and compare data location schemes for a large self-organizing storage cluster that can quickly adapt to the additions or departures of storage nodes. We further present an efficient location scheme that differentiates between small and large file blocks for reduced management overhead compared to uniform strategies. In our protocol, small blocks, which are typically in large quantities, are placed through consistent hashing. Large blocks, much fewer in practice, are placed through a usage-based policy, and their locations are tracked by Bloom filters. The proposed scheme results in improved storage utilization even with non-uniform cluster nodes. To achieve high scalability and fault resilience, this protocol is fully distributed, relies only on soft states, and supports data replication. We demonstrate the effectiveness and efficiency of this protocol through trace-driven simulation.