Efficient query answering in probabilistic RDF graphs

Authors:
Xiang Lian;Lei Chen
Affiliations:
Hong Kong University of Science and Technology, Hong Kong, Hong Kong;Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 27
Cited 2

MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
U-DBMS: a database system for managing constantly-evolving data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Introducing Time into RDF

IEEE Transactions on Knowledge and Data Engineering
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
BayesStore: managing large, uncertain data repositories with probabilistic graphical models

Proceedings of the VLDB Endowment
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
Data integration with uncertainty

The VLDB Journal — The International Journal on Very Large Data Bases
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
A unified approach to ranking in probabilistic databases

Proceedings of the VLDB Endowment
Integrating conflicting data: the role of source dependence

Proceedings of the VLDB Endowment
Query Evaluation on Probabilistic RDF Databases

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
Exploring XML web collections with DescribeX

ACM Transactions on the Web (TWEB)
On graph query optimization in large networks

Proceedings of the VLDB Endowment
k-nearest neighbors in uncertain graphs

Proceedings of the VLDB Endowment
Querying probabilistic information extraction

Proceedings of the VLDB Endowment
Querying RDF data from a graph database perspective

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications

Efficient subgraph similarity search on large probabilistic graph databases

Proceedings of the VLDB Endowment
SQBC: An efficient subgraph matching method over large and dense graphs

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we tackle the problem of efficiently answering queries on probabilistic RDF data graphs. Specifically, we model RDF data by probabilistic graphs, and an RDF query is equivalent to a search over subgraphs of probabilistic graphs that have high probabilities to match with a given query graph. To efficiently processqueries on probabilistic RDF graphs, we propose effective pruning mechanisms, structural and probabilistic pruning. For the structural pruning, we carefully design synopses for vertex/edge labels by considering their distributions and other structural information, in order to improve the pruning power. For the probabilistic pruning, we derive a cost model to guide the pre-computation of probability upper bounds such that the query cost is expected to be low. We construct an index structure that integrates synopses/statistics for structural and robabilistic pruning, and propose an efficient approach to answer queries on probabilistic RDF graph data. The efficiency of our solutions has been verified through extensive experiments.