Autonomous database partitioning using data mining on single computers and cluster computers

  • Authors:
  • Liangzhe Li;Le Gruenwald

  • Affiliations:
  • University of Oklahoma, Norman, OK;University of Oklahoma, Norman, OK

  • Venue:
  • Proceedings of the 16th International Database Engineering & Applications Sysmposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the most important metrics in measuring the performance of a database system is query response time, which is composed of I/O time and CPU time. I/O time is decided by the amount of data read/write from/to disks and how the data is located on disks. CPU time is decided by how the database system performs the query operations. So if we want to reduce the query response time we can reduce either I/O time or CPU time, or both of them. We know retrieving data from disks is much slower than retrieving data from main memory. Hence, one of the common ways to reduce I/O times is clustering data on disks so that queries will access only relevant data. This paper introduces an efficient algorithm, called AutoClust, for automatic database attribute clustering (or also called automatic database vertical partitioning) for single computers as well as cluster computers. It is based on closed item sets mined from queries and their attributes using association rule mining. The paper then presents experimental results comparing the performance of AutoClust with that of a baseline algorithm on both single computers and cluster computers using the TPC-H benchmark running on major commercial database systems. The experiments show that AutoClust has better query costs for both types of computers.