LH*—a scalable, distributed data structure

Authors:
Witold Litwin;Marie-Anna Neimat;Donovan A. Schneider
Affiliations:
Hewlett-Packard Labs, Palo Alto, CA;Hewlett-Packard Labs, Palo Alto, CA;Hewlett-Packard Labs, Palo Alto, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
1996

Citing 21
Cited 81

Recursive linear hashing

ACM Transactions on Database Systems (TODS)
Dynamic hash tables

Communications of the ACM
File structures: an analytic approach

File structures: an analytic approach
The design and analysis of spatial data structures

The design and analysis of spatial data structures
Distributed linear hashing and parallel projection in main memory databases

Proceedings of the sixteenth international conference on Very large databases
Distributed file systems: concepts and examples

ACM Computing Surveys (CSUR)
High-speed local area networks and their performance: a survey

ACM Computing Surveys (CSUR)
Parallel database systems: the future of high performance database systems

Communications of the ACM
The fastest LAN alive

BYTE
LH: Linear Hashing for distributed files

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Distributed operating systems

Distributed operating systems
Linear hashing: a new tool for file and table addressing

Readings in database systems (2nd ed.)
Distributed file organization with scalable cost/performance

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Distributing a search tree among a growing number of processors

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Database in crisis and transition: a technical agenda for the year 2001

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Dynamic hashing schemes

ACM Computing Surveys (CSUR)
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information

The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
RP*: A Family of Order Preserving Scalable Distributed Data Structures

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Performance evaluation of the LH*lh scalable, distributed data structure for a cluster of workstations

Proceedings of the 2001 ACM symposium on Applied computing
Viceroy: a scalable and dynamic emulation of the butterfly

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Signature-based structures for objects with set-valued attributes

Information Systems - Databases: Creation, management and utilization
Hashing Methods for Temporal Data

IEEE Transactions on Knowledge and Data Engineering
LH*G: A High-Availability Scalable Distributed Data Structure By Record Grouping

IEEE Transactions on Knowledge and Data Engineering
RP*: A Family of Order Preserving Scalable Distributed Data Structures

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Evaluation of LH*LH for a Multicomputer Architecture

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
ADST: An Order Preserving Scalable Distributed Data Structure with Constant Access Costs

SOFSEM '01 Proceedings of the 28th Conference on Current Trends in Theory and Practice of Informatics Piestany: Theory and Practice of Informatics
Efficient Searching for Multi-dimensional Data Made Simple

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Design and Implementation of Linear Hash Algorithm in a Nested Transaction Environment

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
A Very Efficient Order Preserving Scalable Distributed Data Structure

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Atomic Data Access in Distributed Hash Tables

IPTPS '01 Revised Papers from the First International Workshop on Peer-to-Peer Systems
Performance evaluation of linear hash structure model in a nested transaction environment

Journal of Systems and Software
Data and knowledge in database systems: distributed, heterogeneous, and federated databases

Handbook of data mining and knowledge discovery
Concept and Evaluation of X-NAS: A Highly Scalable NAS System

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Routing networks for distributed hash tables

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Algebraic Signatures for Scalable Distributed Data Structures

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
SWAM: a family of access methods for similarity-search in peer-to-peer data networks

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Improving Availability and Performance with Application-Specific Data Replication

IEEE Transactions on Knowledge and Data Engineering
Tunable randomization for load management in shared-disk clusters

ACM Transactions on Storage (TOS)
Handling Heterogeneity in Shared-Disk File Systems

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An Efficient Data Location Protocol for Self.organizing Storage Clusters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Supporting Load Balancing and Efficient Reorganization During System Scaling

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Hash-based labeling techniques for storage scaling

The VLDB Journal — The International Journal on Very Large Data Bases
LH*RS---a highly-available scalable distributed data structure

ACM Transactions on Database Systems (TODS)
A Scalable P2P Platform for the Knowledge Grid

IEEE Transactions on Knowledge and Data Engineering
Speeding up search in peer-to-peer networks with a multi-way tree structure

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
On scalability of the similarity search in the world of peers

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Efficient Updates in Highly Available Distributed Random Access Memory

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Architecture and interface of scalable distributed database system SD-SQL server

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Survey of research towards robust peer-to-peer networks: search methods

Computer Networks: The International Journal of Computer and Telecommunications Networking
Bandwidth-efficient management of DHT routing tables

NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Boxwood: abstractions as the foundation for storage infrastructure

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Symphony: distributed hashing in a small world

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Content-based document routing and index partitioning for scalable similarity-based searches in a large corpus

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Online balancing of range-partitioned data with applications to peer-to-peer systems

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Dynamic storage balancing in a distributed spatial index

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
SPREAD: an adaptive scheme for redundant and fair storage in dynamic heterogeneous storage systems

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Scalability comparison of Peer-to-Peer similarity search structures

Future Generation Computer Systems
LH*RSP2P: a scalable distributed data structure for P2P environment

NOTERE '08 Proceedings of the 8th international conference on New technologies in distributed systems
A practical scalable distributed B-tree

Proceedings of the VLDB Endowment
Scalable web services interface for SD-SQL server

Proceedings of the 3rd international conference on Scalable information systems
P2P Networking and Applications

P2P Networking and Applications
Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Large-scale indexing of spatial data in distributed repositories: the SD-Rtree

The VLDB Journal — The International Journal on Very Large Data Bases
Skiptree: A new scalable distributed data structure on multidimensional data supporting range-queries

Computer Communications
Distributed online aggregations

Proceedings of the VLDB Endowment
Auto-updatable Index Approach for OODBMSs

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Maintaining and checking parity in highly available Scalable Distributed Data Structures

Journal of Systems and Software
EH*RS: a high-availability scalable distributed data structure

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
EOP: an efficient object placement and location algorithm for OBS cluster

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Galois connections, T-CUBES, and P2P data mining

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
A content-addressable network for similarity search in metric spaces

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Cumulative algebraic signatures for fast string search, protection against incidental viewing and corruption of data in an SDDS

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Fault tolerant record placement for decentralized SDDS LH

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
LH*RSP2P: a fast and high churn resistant scalable distributed data structure for P2P systems

International Journal of Internet Technology and Secured Transactions
D1HT: a distributed one hop hash table

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scale and concurrency of GIGA+: file system directories with millions of files

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
PRESIDIO: A Framework for Efficient Archival Data Storage

ACM Transactions on Storage (TOS)
Privacy of data outsourced to a cloud for selected readers through client-side encryption

Proceedings of the 10th annual ACM workshop on Privacy in the electronic society
SDDSfL vs. local disk-a comparative study for Linux

Annales UMCS, Informatica
Query Optimization by Indexing in the ODRA OODBMS

Annales UMCS, Informatica
Query Optimization by Indexing in the ODRA OODBMS

Annales UMCS, Informatica
Fault-Tolerant protocols for scalable distributed data structures

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
RDIM: a self-adaptive and balanced distribution for replicated data in scalable storage clusters

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Design and implementation of a random data-placement system with high scalability, reliability and performance

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
SkipTree: a scalable range-queryable distributed data structure for multidimensional data

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
OverCite: a cooperative digital research library

IPTPS'05 Proceedings of the 4th international conference on Peer-to-Peer Systems
A scalable nearest neighbor search in p2p systems

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Distributed and scalable similarity searching in metric spaces

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Similarity grid for searching in metric spaces

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
The MINERVA project: towards collaborative search in digital libraries using peer-to-peer technology

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures
An overview of a scalable distributed database system SD-SQL server

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Multifaceted simultaneous load balancing in DHT-Based p2p systems: a new game with old balls and bins

Self-star Properties in Complex Information Systems
Scalable store of java objects using range partitioning

CEE-SET'09 Proceedings of the 4th IFIP TC 2 Central and East European conference on Advances in Software Engineering Techniques
Authenticated and persistent skip graph: a data structure for cloud based data-centric applications

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Distributed memory virtualization with the use of SDDSfL

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Scalable Distributed Two-Layer Data Structures SD2DS

International Journal of Distributed Systems and Technologies
Recoverable encryption through a noised secret over a large cloud

Transactions on Large-Scale Data- and Knowledge-centered systems IX

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a scalable distributed data structure called LH*. LH* generalizes Linear Hashing (LH) to distributed RAM and disk files. An LH* file can be created from records with primary keys, or objects with OIDs, provided by any number of distributed and autonomous clients. It does not require a central directory, and grows gracefully, through splits of one bucket at a time, to virtually any number of servers. The number of messages per random insertion is one in general, and three in the worst case, regardless of the file size. The number of messages per key search is two in general, and four in the worst case. The file supports parallel operations, e.g., hash joins and scans. Performing a parallel operation on a file of M buckets costs at most 2M + 1 messages, and between 1 and O(log2 Mrounds of messages.We first describle the basic LH* scheme where a coordinator site manages abucket splits, and splits a bucket every time a collision occurs. We show that the average load factor of an LH* file is 65%–70% regardless of file size, and bucket capacity. We then enhance the scheme with load control, performed at no additional message cost. The average load factor then increases to 80–95%. These values are about that of LH, but the load factor for LH* varies more.We nest define LH* schemes without a coordinator. We show that insert and search costs are the same as for the basic scheme. The splitting cost decreases on the average, but becomes more variable, as cascading splits are needed to prevent file overload. Next, we briefly describe two variants of splitting policy, using parallel splits and presplitting that should enhance performance for high-performance applications.All together, we show that LH* files can efficiently scale to files that are orders of magnitude larger in size than single-site files. LH* files that reside in main memory may also be much faster than single-site disk files. Finally, LH* files can be more efficient than any distributed file with a centralized directory, or a static parallel or distributed hash file.