ACM Computing Surveys (CSUR)
A taxonomy of parallel sorting
ACM Computing Surveys (CSUR)
A global approach to record clustering and file reorganization
Proc. of the third joint BCS and ACM symposium on Research and development in information retrieval
ACM Transactions on Database Systems (TODS)
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
Disk allocation for Cartesian product files on multiple-disk systems
ACM Transactions on Database Systems (TODS)
Parallel algorithms for the execution of relational database operations
ACM Transactions on Database Systems (TODS)
Concepts and capabilities of a database computer\
ACM Transactions on Database Systems (TODS)
Approximating block accesses in database organizations
Communications of the ACM
Fast parallel sorting algorithms
Communications of the ACM
Data Structures and Algorithms
Data Structures and Algorithms
Design of Database Structures
A heuristic approach to attribute partitioning
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Storage mappings for multidimensional linear dynamic hashing
PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Algorithms for record clustering and file reorganization
Algorithms for record clustering and file reorganization
Dynamic file allocation in disk arrays
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Two techniques for on-line index modification in shared nothing parallel databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Information retrieval on the web
ACM Computing Surveys (CSUR)
A Parallel Algorithm for Relational Database Normalization
IEEE Transactions on Parallel and Distributed Systems
Research issues in automatic database clustering
ACM SIGMOD Record
pPOP: Fast yet accurate parallel hierarchical clustering using partitioning
Data & Knowledge Engineering
Hi-index | 0.00 |
We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the P-tree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an enviornment with relatively little overlap among the queries.