A decomposition-based simulated annealing technique for data clustering

Authors:
Kien A. Hua;S. D. Lang;Wen K. Lee
Affiliations:
University of Central Florida;University of Central Florida;University of Central Florida
Venue:
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Year:
1994

Citing 9
Cited 9

Scene Segmentation from Visual Motion Using Global Optimization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Simulated annealing: theory and applications

Simulated annealing: theory and applications
Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing

Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing
Optimization of large join queries: combining heuristics and combinatorial techniques

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
The use of simulated annealing for clustering data in databases

Information Systems
A stochastic approach for clustering in object bases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
On the performance of object clustering techniques

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Extending the Search Strategy in a Query Optimizer

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases

Window query-optimal clustering of spatial objects

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Automating physical database design in a parallel database

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Hierarchical data placement for navigational multimedia applications

Data & Knowledge Engineering
Affinity-Based Probabilistic Reasoning and Document Clustering on the WWW

COMPSAC '00 24th International Computer Software and Applications Conference
Efficient Region Query Processing by Optimal Page Ordering

ADBIS-DASFAA '00 Proceedings of the East-European Conference on Advances in Databases and Information Systems Held Jointly with International Conference on Database Systems for Advanced Applications: Current Issues in Databases and Information Systems
A Tool for Nesting and Clustering Large Objects

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
A simulated annealing approach for multimedia data placement

Journal of Systems and Software
An overview on MEMS-based storage, its research issues and open problems

SNAPI '04 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been demonstrated that simulated annealing provides high-quality results for the data clustering problem. However, existing simulated annealing schemes are memory-based algorithms; they are not suited for solving large problems such as data clustering which typically are too big to fit in the memory space in its entirety. Various buffer replacement policies, assuming either temporal or spatial locality, are not useful in this case since simulated annealing is based on a randomized search process. Poor locality of references will cause the memory to thrash because too many replacements are required. This phenomenon will incur excessive disk accesses and force the machine to run at the speed of the I/O subsystem. In this paper, we formulate the data clustering problem as a graph partition problem (GPP), and propose a decomposition-based approach to address the issue of excessive disk accesses during annealing. We apply the statistical sampling technique to randomly select subgraphs of the GPP into memory for annealing. Both the analytical and experimental studies indicate that the decomposition-based approach can dramatically reduce the costly disk I/O activities while obtaining excellent optimized results.