A parallel algorithm for record clustering

Authors:
Edward Omiecinski;Peter Scheuermann
Affiliations:
Georgia Institute of Technology, Atlanta;Northwestern Univ., Evanston, IL
Venue:
ACM Transactions on Database Systems (TODS)
Year:
1990

Citing 15
Cited 6

Parallel graph algorithms

ACM Computing Surveys (CSUR)
A taxonomy of parallel sorting

ACM Computing Surveys (CSUR)
A global approach to record clustering and file reorganization

Proc. of the third joint BCS and ACM symposium on Research and development in information retrieval
Adaptive record clustering

ACM Transactions on Database Systems (TODS)
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
Concepts and capabilities of a database computer\

ACM Transactions on Database Systems (TODS)
Approximating block accesses in database organizations

Communications of the ACM
Fast parallel sorting algorithms

Communications of the ACM
Data Structures and Algorithms

Data Structures and Algorithms
Design of Database Structures

Design of Database Structures
A heuristic approach to attribute partitioning

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Storage mappings for multidimensional linear dynamic hashing

PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Algorithms for record clustering and file reorganization

Algorithms for record clustering and file reorganization

Dynamic file allocation in disk arrays

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Two techniques for on-line index modification in shared nothing parallel databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Information retrieval on the web

ACM Computing Surveys (CSUR)
A Parallel Algorithm for Relational Database Normalization

IEEE Transactions on Parallel and Distributed Systems
Research issues in automatic database clustering

ACM SIGMOD Record
pPOP: Fast yet accurate parallel hierarchical clustering using partitioning

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an enviornment with relatively little overlap among the queries.