Mining for empty spaces in large data sets

  • Authors:
  • Jeff Edmonds;Jarek Gryz;Dongming Liang;Renée J. Miller

  • Affiliations:
  • Department of Computer Science, York University, 4700 Kelle Street, Toronto Ont., M3J 1P3, Canada;Department of Computer Science, York University, 4700 Kelle Street, Toronto Ont., M3J 1P3, Canada;Department of Computer Science, York University, 4700 Kelle Street, Toronto Ont., M3J 1P3, Canada;Department of Computer Science, University of Toronto, Toronto, Ont., Canada

  • Venue:
  • Theoretical Computer Science - Database theory
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many data mining approaches focus on the discovery of similar (and frequent) data values in large data sets. We present an alternative, but complementary approach in which we search for empty regions in the data. We consider the problem of finding all maximal empty rectangles in large, two-dimensional data sets. We introduce a novel, scalable algorithm for finding all such rectangles. The algorithm achieves this with a single scan over a sorted data set and requires only a small bounded amount of memory. We extend the algorithm to find all maximal empty hyper-rectangles in a multi-dimensional space. We consider the complexity of this search problem and present new bounds on the number of maximal empty hyper-rectangles. We briefly overview experimental results obtained by applying our algorithm to real and synthetic data sets and describe one application of empty-space knowledge to query optimization.