Hashing by proximity to process duplicates in spatial databases

  • Authors:
  • Walid G. Aref;Hanan Samet

  • Affiliations:
  • Matsushita Information Technology Laboratory, Two Research Way, Princeton, New Jersey;Computer Science Department and Center for Automation Research and Institute for Advanced Computer Studies, The University of Maryland College Park, Maryland

  • Venue:
  • CIKM '94 Proceedings of the third international conference on Information and knowledge management
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

In a spatial database, an object may extend arbitrarily in space. As a result, many spatial data structures (e.g., the quadtree, the cell tree, the R+-tree) represent an object by partitioning it into multiple, yet simple, pieces, each of which is stored separately inside the data structure. Many operations on these data structures are likely to produce duplicate results because of the multiplicity of object pieces. A novel approach for duplicate processing based on proximity of spatial objects is presented. This is different from conventional duplicate elimination in database systems because, with spatial databases, different pieces of the same object can span multiple buckets of the underlying data structure. Example algorithms are presented to perform duplicate processing using proximity for quadtree representation of line segments and arbitrary rectangles. The complexity of the algorithms is seen to depend on a geometric classification of different instances of the spatial objects. By using proximity and the spatial properties of the objects, the number of disk-I/O requests as well as the run-time storage during duplicate processing can be reduced.