Exploring spatial datasets with histograms

  • Authors:
  • Chengyu Sun;Nagender Bandi;Divyakant Agrawal;Amr El Abbadi

  • Affiliations:
  • Department of Computer Science, California State University, Los Angeles;Department of Computer Science, University of California, Santa Barbara;Department of Computer Science, University of California, Santa Barbara;Department of Computer Science, University of California, Santa Barbara

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

As online spatial datasets grow both in number and sophistication, it becomes increasingly difficult for users to decide whether a dataset is suitable for their tasks, especially when they do not have prior knowledge of the dataset. In this paper, we propose browsing as an effective and efficient way to explore the content of a spatial dataset. Browsing allows users to view the size of a result set before evaluating the query at the database, thereby avoiding zero-hit/mega-hit queries and saving time and resources. Although the underlying technique supporting browsing is similar to range query aggregation and selectivity estimation, spatial dataset browsing poses some unique challenges. In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the contains, contained and the overlap relations. We prove a lower bound on the storage required to answer queries about the contains relation accurately at a given resolution. We then present three storage-efficient approximation algorithms which we believe to be the first to estimate query results about these spatial relations. We evaluate these algorithms with both synthetic and real world datasets and show that they provide highly accurate estimates for datasets with various characteristics.