Clustering declustered data for efficient retrieval

  • Authors:
  • Hakan Ferhatosmanoglu;Divyakant Agrawal;Amr El Abbadi

  • Affiliations:
  • Department of Computer Science, University of California, Santa Barbara, CA;Department of Computer Science, University of California, Santa Barbara, CA;Department of Computer Science, University of California, Santa Barbara, CA

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern databases increasingly integrate new kinds of information, such as multimedia information in the form of image, video, and audio data. Both the dimensionality and the amount of data that need to be processed is increasing rapidly, increasing the demand for the efficient retrieval of large amounts of multi-dimensional data. Declustering techniques for multi-disk architectures have been effectively used for storage. In this paper, we first establish that besides exploiting the parallelism, a careful organization of each disk must be considered for fast searching. We introduce the notion of page allocation and data space mapping which can be used to organize and retrieve multidimensional data. We develop these notions based on three different partitioning strategies: regular grid partitioning, concentric hypercubes and hyperpyramids. We develop techniques that satisfy efficient retrieval by optimizing the number of buckets retrieved by the query, disk arm movement and I/O parallelism. We prove that concentric hypercube-based mapping satisfies the optimal clustering and optimal parallelism. We develop a technique based on hyperpyramid partitioning that reduces the number of buckets retrieved by the query and has efficient inter- and intra-disk organizations. We evaluate the performance of proposed techniques by comparing them with the current approaches. The new techniques lead to very significant improvement over the existing techniques, and result in fast retrieval of multi-dimensional data.