Counting graph matches with adaptive statistics collection

Authors:
Jianhua Feng;Qian Qian;Yuguo Liao;Lizhu Zhou
Affiliations:
Department of Computer Science and Technology, Tsinghua University, Beijing, China;Department of Computer Science and Technology, Tsinghua University, Beijing, China;Department of Computer Science and Technology, Tsinghua University, Beijing, China;Department of Computer Science and Technology, Tsinghua University, Beijing, China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 7
Cited 0

Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Index Structures for Path Expressions

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Counting Twig Matches in a Tree

Proceedings of the 17th International Conference on Data Engineering
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
DMT: a flexible and versatile selectivity estimation approach for graph query

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

High performance of query processing in large scale graph-structured data poses a pressing demand for high-quality statistics collection and selectivity estimation. Precise and succinct statistics collection about graph-structured data plays a crucial role for graph query selectivity estimation. In this paper, we propose the approach SMT, Succinct Markov Table, which achieves high precision in selectivity estimation with low memory space consumed. Four core notions of SMT are constructing, refining, compressing and estimating. The efficient algorithm SMTBuilder provides facility to build adaptive statistics model in the form of SMT. Versatile optimization rules, which investigate local bi-directional reachability, are introduced in SMT refining. During compressing, affective SMT grouping techniques are introduced. Statistical methods are used for selectivity estimations of various graph queries basing on SMT, especially for twig queries. By a thorough experimental study, we demonstrate SMT's advantages in accuracy and space by comparing with previously known alternative, as well as the preferred optimization rules and compressing technique that would favor different real-life data.