Adaptive sampling for geometric problems over data streams

  • Authors:
  • John Hershberger;Subhash Suri

  • Affiliations:
  • Mentor Graphics Corp., 8005 SW Boeckman Road, Wilsonville, OR 97070, USA;Computer Science Department, University of California, Santa Barbara, CA 93106, USA

  • Venue:
  • Computational Geometry: Theory and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. This paper focuses on the problem of summarizing such geometric data streams using limited storage so that many natural geometric queries can be answered faithfully. Some examples of such queries are: report the smallest convex region in which a chemical leak has been sensed, or track the diameter of the dataset, or track the extent of the dataset in any given direction. One can also pose queries over multiple streams: for instance, track the minimum distance between the convex hulls of two data streams, report when datasets A and B are no longer linearly separable, or report when points of data stream A become completely surrounded by points of data stream B, etc. These queries are easily extended to more than two streams. In this paper, we propose an adaptive sampling scheme that gives provably optimal error bounds for extremal problems of this nature. All our results follow from a single technique for computing the approximate convex hull of a point stream in a single pass. Our main result is this: given a stream of two-dimensional points and an integer r, we can maintain an adaptive sample of at most 2r+1 points such that the distance between the true convex hull and the convex hull of the sample points is O(D/r^2), where D is the diameter of the sample set. The amortized time for processing each point in the stream is O(logr). Using the sample convex hull, all the queries mentioned above can be answered approximately in either O(logr) or O(r) time.