Declustering using error correcting codes

Authors:
C. Faloutsos;D. Metaxas
Affiliations:
University of Maryland, College Park and University of Maryland Institute for Advanced Computer Studies (UMIACS);University of Maryland, College Park and University of Toronto, Ontario, CANADA
Venue:
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Year:
1989

Citing 10
Cited 17

Disk allocation methods for binary Cartesian product files

BIT
Parallel free-text search on the connection machine system

Communications of the ACM - Special issue on parallelism
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
Optimal partial-match retrieval when fields are independently specified

ACM Transactions on Database Systems (TODS)
Parallel searching for binary Cartesian product files

CSC '85 Proceedings of the 1985 ACM thirteenth annual conference on Computer Science
Attribute based file organization in a paged memory environment

Communications of the ACM
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases

Semantic complexity of classes of relational queries and query independent data partitioning

PODS '91 Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multi-disk B-trees

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Optimal disk allocation for partial match queries

ACM Transactions on Database Systems (TODS)
Efficient disk allocation for fast similarity searching

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
On the optimality of disk allocation for Cartesian product files (extended abstract)

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
CMD: A Multidimensional Declustering Method for Parallel Data Systems

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Hamming Filters: A Dynamic Signature File Organization for Parallel Stores

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Optimal Parallel I/O for Range Queries through Replication

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Disk Allocation for Fast Range and Nearest-Neighbor Queries

Distributed and Parallel Databases
Replicated declustering for arbitrary queries

Proceedings of the 2004 ACM symposium on Applied computing
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient retrieval of replicated data

Distributed and Parallel Databases
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Threshold-based declustering

Information Sciences: an International Journal
Equivalent disk allocations

Proceedings of the 2007 ACM symposium on Applied computing
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Threshold based declustering in high dimensions

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem examined is to distribute a binary Cartesian product file on multiple disks to maximize the parallelism for partial match queries. Cartesian product files appear as a result of some secondary key access methods, such as the multiattribute hashing [10], the grid file [6] etc.. For the binary case, the problem is reduced into grouping the 2n binary strings on n bits in m groups of unsimilar strings. The main idea proposed in this paper is to group the strings such that the group forms an Error Correcting Code (ECC). This construction guarantees that the strings of a given group will have large Hamming distances, i.e., they will differ in many bit positions. Intuitively, this should result into good declustering. We briefly mention previous heuristics for declustering, we describe how exactly to build a declustering scheme using an ECC, and we prove a theorem that gives a necessary condition for our method to be optimal. Analytical results show that our method is superior to older heuristics, and that it is very close to the theoretical (non-tight) bound.