Disk Allocation for Fast Range and Nearest-Neighbor Queries

  • Authors:
  • Sunil Prabhakar;Divyakant Agrawal;Amr El Abbadi

  • Affiliations:
  • Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, USA. sunil@cs.purdue.edu;Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. agrawal@cs.ucsb.edu;Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. amr@cs.ucsb.edu

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

As databases increasingly integrate non-textual multimedia information it is becoming necessary to support efficient similarity searching in addition to range searching. Range and nearest-neighbor (similarity) queries are the most important class of queries for multimedia and multi-dimensional databases. Due to the large sizes of the datasets involved, I/O is a critical factor limiting performance. The use of parallel I/O through declustering of the data is a promising approach to improve performance. Consequently several research efforts have addressed the problem of declustering multidimensional data for optimizing range and partial match queries. Very limited work has been done for similarity queries, and the problem of declustering for combined range and similarity queries has not been addressed in the literature. Consider a dataset of images where the following metadata for each image is also stored: date on which the picture was taken, longitude and latitude of the site of the picture. An example of a combined query is: Given a target image, find the 5 most similar images taken within 3 months of the target image and located within 2 degrees of longitude and latitude of the target image. In order to answer this query, it is necessary to conduct a range search on the date, longitude and latitude values and a similarity search on the image content.In this paper, we develop new declustering schemes that provide good declustering for similarity searching. In addition, we show that the new schemes have very good performance for range queries as well as combination queries. The new schemes are based upon the Cyclic declustering schemes which were developed for range and partial match queries. The Cyclic schemes not only provide superior performance to earlier schemes, but are also very robust and consistent with respect to query types and variations in system parameters.