Multiresolution select-distinct queries on large geographic point sets

  • Authors:
  • Sarana Nutanong;Marco D. Adelfio;Hanan Samet

  • Affiliations:
  • University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD

  • Venue:
  • Proceedings of the 20th International Conference on Advances in Geographic Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many spatial applications require the ability to display locations of data entries on an online map. For example, an online photo-sharing service may wish to display photos according to where they were taken. Since many photos can occupy the same area and overlap each other within a display window, less popular or older images (based on a given measure of importance) can be discarded so that these more popular or newer photos become more distinct. A straightforward solution to this problem is (i) to use a window query to retrieve data entries within a given display window; (ii) to discard data entries in proximity of a more important one. This method works well in a high spatial selectivity setting, e.g., when the window query returns a small number of entries, but the performance drastically degrades as the spatial selectivity decreases. We consider this problem as selecting distinct data entries from a given dataset, where the "distinctiveness" of a data entry depends on its relative importance in comparison to that of other data entries in proximity. In this paper, we propose a new query type called the multi-resolution select-distinct (MRSD) query. The main novelty of our query processing method is a voting system built upon an ensemble of interrelated indexes, which allows us to efficiently determine the degree of distinctiveness of all points within a query window. Using a real dataset of over 9 million locations, our experimental results show that our proposed method is capable of consistently producing subsecond response times, while the window query-based method takes more than 10 seconds on average in a low spatial selectivity setting.