SketchTree: Approximate Tree Pattern Counts over Streaming Labeled Trees

  • Authors:
  • Praveen Rao;Bongki Moon

  • Affiliations:
  • University of Arizona;University of Arizona

  • Venue:
  • ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, there has been a rising interest in developing online approximation algorithms for data streams. Some of the key challenges are posed by the fact that streaming data can be read only once in a fixed order of arrival and only a limited amount of memory is available for storage. In this paper, we address the problem of approximately counting tree patterns over a stream of labeled trees (e.g., XML documents). We propose a new approximation algorithm called SketchTree that computes a synopsis of the stream in a single pass by processing each tree only once. Using a limited amount of memory, SketchTree provides approximate answers for both ordered and unordered tree pattern counts. Furthermore, we discuss a class of count queries that can be handled by SketchTree and their utility. We provide theoretical analyses to show that our algorithm has provably strong guarantees on the error bounds. Experiments on real datasets demonstrate that SketchTree can indeed estimate tree pattern counts within 10-15% relative error with high confidence under various situations.