SketchTree: Approximate Tree Pattern Counts over Streaming Labeled Trees

Authors:
Praveen Rao;Bongki Moon
Affiliations:
University of Arizona;University of Arizona
Venue:
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Year:
2006

Citing 0
Cited 2

Efficient mining of frequent XML query patterns with repeating-siblings

Information and Software Technology
A gossip-based approach for Internet-scale cardinality estimation of XPath queries over distributed semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, there has been a rising interest in developing online approximation algorithms for data streams. Some of the key challenges are posed by the fact that streaming data can be read only once in a fixed order of arrival and only a limited amount of memory is available for storage. In this paper, we address the problem of approximately counting tree patterns over a stream of labeled trees (e.g., XML documents). We propose a new approximation algorithm called SketchTree that computes a synopsis of the stream in a single pass by processing each tree only once. Using a limited amount of memory, SketchTree provides approximate answers for both ordered and unordered tree pattern counts. Furthermore, we discuss a class of count queries that can be handled by SketchTree and their utility. We provide theoretical analyses to show that our algorithm has provably strong guarantees on the error bounds. Experiments on real datasets demonstrate that SketchTree can indeed estimate tree pattern counts within 10-15% relative error with high confidence under various situations.