Exploiting duality in summarization with deterministic guarantees

  • Authors:
  • Panagiotis Karras;Dimitris Sacharidis;Nikos Mamoulis

  • Affiliations:
  • Hong Kong University;National Technical University of Athens;Hong Kong University

  • Venue:
  • Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over logε* factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over logε* + logn in time and B(1-log B over log n) in space, where ε* is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.