Finding Regions of Interest in Large Scientific Datasets

  • Authors:
  • Rishi Rakesh Sinha;Marianne Winslett;Kesheng Wu

  • Affiliations:
  • Microsoft Corporation, Redmond,;Department of Computer Science, University of Illinois at Urbana-Champaign,;Lawrence Berkeley National Laboratory,

  • Venue:
  • SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Consider a scientific range query, such as find all places in Africa where yesterday the temperature was over 35 degrees and it rained . In theory, one can answer such queries by returning all geographic points that satisfy the query condition. However, in practice, users do not find this low-level answer very useful; instead they require the points to be consolidated into regions, i.e., sets of points that all satisfy the query conditions and are adjacent in the underlying mesh. In this paper, we show that when a high-quality index is used to find the points and a good traditional connected component labeling algorithm is used to build the regions, the cost of consolidating the points into regions dominates range query response time. We then show how to find query result points and consolidate them into regions in expected time that is sublinear in the number of result points. This seemingly miraculous speedup comes from a point lookup phase that uses bitmap indexes and produces a compressed bitmap as the intermediate query result, followed by a region consolidation phase that operates directly on the intermediate query result bitmap and exploits the spatial properties of the underlying mesh to greatly reduce the cost of consolidating the result points into regions. Our experiments with real-world scientific data demonstrate that in practice, our approach to region consolidation is over 10 times faster than a traditional connected component algorithm.