Adaptive sampling for geometric problems over data streams

  • Authors:
  • John Hershberger;Subhash Suri

  • Affiliations:
  • Mentor Graphics Corp., Wilsonville, OR;University of California, Santa Barbara, CA

  • Venue:
  • PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. How can one summarize such data streams using limited storage so that many natural geometric queries can be answered faithfully? Some examples of such queries are: report the smallest convex region in which a chemical leak has been sensed, or track the diameter of the dataset. One can also pose queries over multiple streams: track the minimum distance between the convex hulls of two data streams; or report when datasets A and B are no longer linearly separable.In this paper, we propose an adaptive sampling scheme that gives provably optimal error bounds for extremal problems of this nature. All our results follow from a single technique for computing the approximate convex hull of a point stream in a single pass. Our main result is this: given a stream of two-dimensional points and an integer r, we can maintain an adaptive sample of at most 2r + 1 points such that the distance between the true convex hull and the convex hull of the sample points is O(D/r2), where D is the diameter of the sample set. With our sample convex hull, all the queries mentioned above can be answered in either O(log r) or O(r) time.