Summarizing spatial data streams using ClusterHulls

  • Authors:
  • John Hershberger;Nisheeth Shrivastava;Subhash Suri

  • Affiliations:
  • Mentor Graphics Corp., Wilsonville, Ohio;Bell-Labs Research, Bangalore, India;University of California, Santa Barbara, Santa Barbara, California

  • Venue:
  • Journal of Experimental Algorithmics (JEA)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the following problem: given an on-line, possibly unbounded stream of two-dimensional (2D) points, how can we summarize its spatial distribution or shape using a small, bounded amount of memory? We propose a novel scheme, called ClusterHull, which represents the shape of the stream as a dynamic collection of convex hulls, with a total of at most m vertices, where m is the size of the memory. The algorithm dynamically adjusts both the number of hulls and the number of vertices in each hull to best represent the stream using its fixed-memory budget. This algorithm addresses a problem whose importance is increasingly recognized, namely, the problem of summarizing real-time data streams to enable on-line analytical processing. As a motivating example, consider habitat monitoring using wireless sensor networks. The sensors produce a steady stream of geographic data, namely, the locations of objects being tracked. In order to conserve their limited resources (power, bandwidth, and storage), the sensors can compute, store, and exchange ClusterHull summaries of their data, without losing important geometric information. We are not aware of other schemes specifically designed for capturing shape information in geometric data streams and so we compare ClusterHull with some of the best general-purpose clustering schemes, such as CURE, k-medians, and LSEARCH. We show through experiments that ClusterHull is able to represent the shape of two-dimensional data streams more faithfully and flexibly than the stream versions of these clustering algorithms.