Managing Statistical Behavior of Large Data Sets in Shared-Nothing Architectures

Authors:
Isidore Rigoutsos;Alex Delis
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 33
Cited 1

Numerical recipes in C: the art of scientific computing

Numerical recipes in C: the art of scientific computing
Concurrent maintenance of data systems for telecommunications

The Computer Journal
Hashing practice: analysis of hashing and universal hashing

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Gray Codes for Partial Match and Range Queries

IEEE Transactions on Software Engineering
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Research issues in spatial databases

ACM SIGMOD Record - Directions for future database research & development
Parallel R-trees

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Searching in Parallel for Similar Strings

IEEE Computational Science & Engineering
Distributing a search tree among a growing number of processors

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package

Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
Efficient Hardware Hashing Functions for High Performance Computers

IEEE Transactions on Computers
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Efficient locking for concurrent operations on B-trees

ACM Transactions on Database Systems (TODS)
Data Structures for Range Searching

ACM Computing Surveys (CSUR)
An efficient method for distributing search structures

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Distribution-dependent hashing functions and their characteristics

SIGMOD '75 Proceedings of the 1975 ACM SIGMOD international conference on Management of data
Clustering Algorithms

Clustering Algorithms
Database Design

Database Design
Computer Vision

Computer Vision
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Multidimensional Indexing for Recognizing Visual Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Heterogeneous Distributed Shared Memory

IEEE Transactions on Parallel and Distributed Systems
Methodical Analysis of Adaptive Load Sharing Algorithms

IEEE Transactions on Parallel and Distributed Systems
Prediction-Based Dynamic Load-Sharing Heuristics

IEEE Transactions on Parallel and Distributed Systems
Strategies for Dynamic Load Balancing on Highly Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
A taxonomy of scheduling in general-purpose distributed computing systems

IEEE Transactions on Software Engineering
Smoothing and Matching of 3-D Space Curves

ECCV '92 Proceedings of the Second European Conference on Computer Vision
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Well-Behaved, Tunable 3D-Affine Invariants

CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Load balancing in homogeneous broadcast distributed systems

Proceedings of the Computer Network Performance Symposium

Radio-wave propagation prediction using ray-tracing techniques on a network of workstations (NOW)

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasingly larger data sets are being stored in networked architectures. Many of the available data structures are not easily amenable to parallel realizations. Hashing schemes show promise in that respect for the simple reason that the underlying data structure can be decomposed and spread among the set of cooperating nodes with minimal communication and maintenance requirements. In all cases, storage utilization and load balancing are issues that need to be addressed. One can identify two basic approaches to tackle the problem. One way is to address it as part of the design of the data structure that is used to store and retrieve the data. The other is to maintain the data structure intact but address the problem separately. The method that we present here falls in the latter category and is applicable whenever a hash table is the preferred data structure. Intrinsically attached to the used hash table is a hashing function that allows one to partition a possibly unbounded set of data items into a finite set of groups; the hashing function provides the partitioning by assigning each data item to one of the groups. In general, the hashing function cannot guarantee that the various groups will have the same cardinality, on average, for all possible data item distributions. In this paper, we propose a two-stage methodology that uses the knowledge of the hashing function to reorganize the group assignments so that the resulting groups have similar expected cardinalities. The method is generally applicable and independent of the used hashing function. We show the power of the methodology using both synthetic and real-world databases. The derived quasi-uniform storage occupancy and associated load-balancing gains are significant.