Similarity search on a large collection of point sets

  • Authors:
  • Marco D. Adelfio;Sarana Nutanong;Hanan Samet

  • Affiliations:
  • University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD

  • Venue:
  • Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spatial applications often require the ability to perform similarity search over a collection of point sets. For example, given a geographical distribution of a disease outbreak, find k historical outbreaks with similar spatial distributions from a data collection D. In this paper, we study the problem of similarity search over a collection of point sets using the Hausdorff distance, which is a measure commonly used to determine the maximum discrepancy between two point sets. To avoid computing the Hausdorff distance for all point sets S in D, one may compute an optimistic estimate (i.e., lower bound value) of the actual Hausdorff distance HausDist(Q,S) for each S to rule out sets that are obviously dissimilar to Q. In our investigation, we observed that a commonly used method (called BscLB) to compute an estimate may not produce a result which is indicative of the actual Hausdorff distance. Consequently, we propose a method (called EnhLB) which produces a tighter estimate than the existing one. We then formulate a similarity search algorithm which uses a combination of BscLB and EnhLB to find similar point sets efficiently. In addition, we also extend our method to support an outlier-resistant variant of the Hausdorff distance called the modified Hausdorff distance. We compare our proposed algorithm with an algorithm using only BscLB. The results of our experiments show a reduction in computation time of 72% for searches using the Hausdorff distance and a reduction of 53% using the modified Hausdorff distance.