On the performance of object clustering techniques

Authors:
Manolis M. Tsangaris;Jeffrey F. Naughton
Affiliations:
Department of Computer Sciences, University of Wisconsin Madison;Department of Computer Sciences, University of Wisconsin Madison
Venue:
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Year:
1992

Citing 12
Cited 38

Static grouping of small objects to enhance performance of a paged virtual memory

ACM Transactions on Computer Systems (TOCS)
Adaptive record clustering

ACM Transactions on Database Systems (TODS)
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Cactis: a self-adaptive, concurrent implementation of an object-oriented database management system

ACM Transactions on Database Systems (TODS)
Persistence in the E Language: Issues and implementation

Software—Practice & Experience
The performance and utility of the Cactis implementation algorithms

Proceedings of the sixteenth international conference on Very large databases
A stochastic approach for clustering in object bases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
The O2 system

Communications of the ACM
On the Optimality of the Probability Ranking Scheme in Storage Applications

Journal of the ACM (JACM)
The working set model for program behavior

Communications of the ACM
Fido: A Cache That Learns to Fetch

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Performance Evaluation in a Persistent Object Store

Proceedings of the Third International Workshop on Persistent Object Systems

Database research at Wisconsin

ACM SIGMOD Record
Practical prefetching via data compression

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A decomposition-based simulated annealing technique for data clustering

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Performances of clustering policies in object bases

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Optimization of dynamic query evaluation plans

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The index suggestion problem for object database applications

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Effective clustering of objects stored by linear hashing

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Safe and efficient sharing of persistent objects in Thor

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On the cost of monitoring and reorganization of object bases for clustering

ACM SIGMOD Record
HAC: hybrid adaptive caching for distributed storage systems

Proceedings of the sixteenth ACM symposium on Operating systems principles
Sibling clustering of tree-based spatial indexes for efficient spatial query processing

Proceedings of the seventh international conference on Information and knowledge management
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A comparative study of log-only and in-place update based temporal object database systems

Proceedings of the ninth international conference on Information and knowledge management
Vclusters: a flexible, fine-grained object clustering mechanism

Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Logically Clustered Architectures for Networked Databases

Distributed and Parallel Databases
A study of object declustering strategies in parallel temporal object database systems

Information Sciences—Applications: An International Journal
Bulk-Loading Techniques for Object Databases and an Application to Relational Data

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
VOODB: A Generic Discrete-Event Random Simulation Model To Evaluate the Performances of OODBs

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Adaptive Hybrid Server Architecture for Client Caching ODBMSs

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Clustering Techniques for Minimizing External Path Length

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Affinity-Based Probabilistic Reasoning and Document Clustering on the WWW

COMPSAC '00 24th International Computer Software and Applications Conference
Design Issues in Transaction-Time Temporal Object Database Systems

ADBIS-DASFAA '00 Proceedings of the East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems
Integrated document caching and prefetching in storage hierarchies based on Markov-chain predictions

The VLDB Journal — The International Journal on Very Large Data Bases
Managing schema evolution in a container-based persistent system

Software—Practice & Experience
A Tool for Nesting and Clustering Large Objects

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
On-Line Realignment of Clients in Networked Databases

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Graph-Based Optimizations for Parameter Passing in Remote Invocations

IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Object-oriented database benchmarks

Advanced topics in database research vol. 1
An Adaptive Data-Shipping Architecture for Client Caching Data Management Systems

Distributed and Parallel Databases
The automatic improvement of locality in storage systems

ACM Transactions on Computer Systems (TOCS)
The importance of sibling clustering for efficient bulkload of XML document trees

IBM Systems Journal
A linear time algorithm for optimal tree sibling partitioning and approximation algorithms in Natix

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An XML data allocation method on disks

Journal of Systems Architecture: the EUROMICRO Journal
Opportunistic prioritised clustering framework for improving OODBMS performance

Journal of Systems and Software
Path and cache conscious prefetching (PCCP)

The VLDB Journal — The International Journal on Very Large Data Bases
PIXSAR: incremental reclustering of augmented XML trees

Proceedings of the 10th ACM workshop on Web information and data management
iPIXSAR: incremental clustering of indexed XML data

Proceedings of the 2009 EDBT/ICDT Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload there was at least one cheaper clustering algorithm that matched or almost matched stochastic clustering. Unfortunately, for each workload, the algorithm that approximated stochastic clustering was different. Our experiments also demonstrated that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system. For example, if the goal is to perform well on traversals of small portions of the database starting with a cold cache, the important metric is the per-traversal expansion factor, and a well-chosen placement tree will be nearly optimal; if the goal is to achieve a high steady-state performance with a reasonably large cache, the appropriate metric is the number of pages to which the clustering algorithm maps the active portion of the database. For this metric, the PRP clustering algorithm, which only uses access probabilities achieves nearly optimal performance.