Should SDBMS support a join index?: a case study from CrimeStat

  • Authors:
  • Pradeep Mohan;Ronald E. Wilson;Shashi Shekhar;Betsy George;Ned Levine;Mete Celik

  • Affiliations:
  • University of Minnesota;National Institute of Justice, Washington D.C;University of Minnesota;University of Minnesota;Ned Levine and Associates, Houston, TX;University of Minnesota

  • Venue:
  • Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a spatial crime data warehouse, that is updated infrequently and a set of operations O as well as constraints of storage and update overheads, the index type selection problem is to find a set of index types that can reduce the I/O cost of the set of operations. The index type selection problem is important to improve user experience and system resource utilization in crucial spatial statistics application domains such as mapping and analysis for public safety, public health, ecology, and transportation. This is because the response time of frequent queries based on the set of operations can be improved significantly by an effective choice of index types. Many spatial statistical queries in these application domains make use of a spatial neighborhood matrix, known as W in spatial statistics, which can be thought of as a spatial self-join in spatial database terminology. Currently supported index types such as B-Tree and R-Tree families do not adequately support spatial statistical analysis because they require on-the-fly computation of the WMatrix, slowing down spatial statistical analysis. In contrast, this paper argues that Spatial Database Management Systems (SDBMS) should support a join index to materialize the WMatrix and eliminate on-the-fly computation of the common selfjoin. A detailed case study using the popular spatial statistical software package for public safety, namely CrimeStat, shows that join indices can significantly speed up spatial analysis such as calculation of Ripley's K and identification of hotspots.