A decomposition-based simulated annealing technique for data clustering

  • Authors:
  • Kien A. Hua;S. D. Lang;Wen K. Lee

  • Affiliations:
  • University of Central Florida;University of Central Florida;University of Central Florida

  • Venue:
  • PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

It has been demonstrated that simulated annealing provides high-quality results for the data clustering problem. However, existing simulated annealing schemes are memory-based algorithms; they are not suited for solving large problems such as data clustering which typically are too big to fit in the memory space in its entirety. Various buffer replacement policies, assuming either temporal or spatial locality, are not useful in this case since simulated annealing is based on a randomized search process. Poor locality of references will cause the memory to thrash because too many replacements are required. This phenomenon will incur excessive disk accesses and force the machine to run at the speed of the I/O subsystem. In this paper, we formulate the data clustering problem as a graph partition problem (GPP), and propose a decomposition-based approach to address the issue of excessive disk accesses during annealing. We apply the statistical sampling technique to randomly select subgraphs of the GPP into memory for annealing. Both the analytical and experimental studies indicate that the decomposition-based approach can dramatically reduce the costly disk I/O activities while obtaining excellent optimized results.