Estimating selectivity for joined RDF triple patterns

Authors:
Hai Huang;Chengfei Liu
Affiliations:
Swinburne University of Technology, Melbourne, Australia;Swinburne University of Technology, Melbourne, Australia
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 15
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A Bayesian Method for the Induction of Probabilistic Networks from Data

Machine Learning
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Learning belief networks from data: an information theory based approach

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
An efficient SQL-based RDF querying scheme

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
SwetoDblp ontology of Computer Science publications

Web Semantics: Science, Services and Agents on the World Wide Web
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Operations for learning with graphical models

Journal of Artificial Intelligence Research

Selectivity estimation for hybrid queries over text-rich data graphs

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A fundamental problem related to RDF query processing is selectivity estimation, which is crucial to query optimization for determining a join order of RDF triple patterns. In this paper we focus research on selectivity estimation for SPARQL graph patterns. The previous work takes the join uniformity assumption when estimating the joined triple patterns. This assumption would lead to highly inaccurate estimations in the cases where properties in SPARQL graph patterns are correlated. We take into account the dependencies among properties in SPARQL graph patterns and propose a more accurate estimation model. Since star and chain query patterns are common in SPARQL graph patterns, we first focus on these two basic patterns and propose to use Bayesian network and chain histogram respectively for estimating the selectivity of them. Then, for estimating the selectivity of an arbitrary SPARQL graph pattern, we design algorithms for maximally using the precomputed statistics of the star paths and chain paths. The experiments show that our method outperforms existing approaches in accuracy.