Scalable join processing on very large RDF graphs

Authors:
Thomas Neumann;Gerhard Weikum
Affiliations:
Max-Planck Institute for Informatics, Saarbrücken, Germany;Max-Planck Institute for Informatics, Saarbrücken, Germany
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 27
Cited 57

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Implementation of magic-sets in a relational database system

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Cost-based optimization for magic: algebra and implementation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Fast, Randomized Join-Order Selection - Why Use Transformations?

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Bypassing Joins in Disjunctive Queries

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Integrating Semi-Join-Reducers into State of the Art Query Processors

Proceedings of the 17th International Conference on Data Engineering
An efficient SQL-based RDF querying scheme

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Estimating the cardinality of RDF graph patterns

Proceedings of the 16th international conference on World Wide Web
Optimal top-down join enumeration

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Adaptive query processing

Foundations and Trends in Databases
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Interlinking the Social Web with Semantics

IEEE Intelligent Systems
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
Sideways Information Passing for Push-Style Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
GRIN: a graph based RDF index

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

DOGMA: A Disk-Oriented Graph Matching Algorithm for RDF Databases

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Detecting Inconsistencies in the Gene Ontology Using Ontology Databases with Not-gadgets

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
Towards scalable RDF graph analytics on MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
RDFProv: A relational RDF store for querying and managing scientific workflow provenance

Data & Knowledge Engineering
Relational processing of RDF queries: a survey

ACM SIGMOD Record
Open user schema guided evaluation of streaming RDF queries

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
RDFViewS: a storage tuning wizard for RDF applications

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Atlas: Storing, updating and querying RDF(S) data on top of DHTs

Web Semantics: Science, Services and Agents on the World Wide Web
Invited paper: Scalable reduction of large datasets to interesting subsets

Web Semantics: Science, Services and Agents on the World Wide Web
x-RDF-3X: fast querying, high update rates, and consistency for RDF databases

Proceedings of the VLDB Endowment
Compact representation of large RDF data sets for publishing and exchange

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
SPARQL query optimization on top of DHTs

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Linked data query processing strategies

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Using reformulation trees to optimize queries over distributed heterogeneous sources

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
3XL: Supporting efficient operations on very large OWL Lite triple-stores

Information Systems
Parallelizing join computations of SPARQL queries for large semantic web databases

Proceedings of the 2011 ACM Symposium on Applied Computing
What are real SPARQL queries like?

Proceedings of the International Workshop on Semantic Web Information Management
gStore: answering SPARQL queries via subgraph matching

Proceedings of the VLDB Endowment
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
A Study of RDB-based RDF data management techniques

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Using ontology databases for scalable query answering, inconsistency detection, and data integration

Journal of Intelligent Information Systems
ANAPSID: an adaptive query processing engine for SPARQL endpoints

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Estimating selectivity for joined RDF triple patterns

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient resource attribute retrieval in RDF triple stores

Proceedings of the 20th ACM international conference on Information and knowledge management
AWETO: efficient incremental update and querying in rdf storage system

Proceedings of the 20th ACM international conference on Information and knowledge management
D2R2: disk-oriented deductive reasoning in a RISC-style RDF engine

RuleML'11 Proceedings of the 5th international conference on Rule-based modeling and computing on the semantic web
To cache or not to cache: the effects of warming cache in complex SPARQL queries

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
View selection in Semantic Web databases

Proceedings of the VLDB Endowment
Efficient processing of RDF graph pattern matching on MapReduce platforms

Proceedings of the second international workshop on Data intensive computing in the clouds
Accelerating large semantic web databases by parallel join computations of SPARQL queries

ACM SIGAPP Applied Computing Review
FlexTable: using a dynamic relation model to store RDF data

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Scalable distributed indexing and query processing over Linked Data

Web Semantics: Science, Services and Agents on the World Wide Web
Efficiently joining group patterns in SPARQL queries

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
The comparison between histogram method and index method in selectivity estimation

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
A pattern-based approach for efficient query processing over RDF data

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Efficient multi-view maintenance in the social semantic web

Proceedings of the 21st international conference companion on World Wide Web
Heuristics-based query optimisation for SPARQL

Proceedings of the 15th International Conference on Extending Database Technology
Partitioned indexes for entity search over RDF knowledge bases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
RDF data management in the Amazon cloud

Proceedings of the 2012 Joint EDBT/ICDT Workshops
A structural approach to indexing triples

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Towards efficient join processing over large RDF graph using mapreduce

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
RP-Filter: a path-based triple filtering method for efficient SPARQL query processing

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Get tracked: a triple store for RFID traceability data

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
A linear algebra technique for (de)centralized processing of SPARQL queries

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Efficient query answering against dynamic RDF databases

Proceedings of the 16th International Conference on Extending Database Technology
Scalable SAPRQL querying processing on large RDF data in cloud computing environment

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Binary RDF representation for publication and exchange (HDT)

Web Semantics: Science, Services and Agents on the World Wide Web
Sparqling kleene: fast property paths in RDF-3X

First International Workshop on Graph Data Management Experiences and Systems
A distributed graph engine for web scale RDF data

Proceedings of the VLDB Endowment
Evaluation of RDF queries via equivalence

Frontiers of Computer Science: Selected Publications from Chinese Universities
Unicorn: a system for searching the social graph

Proceedings of the VLDB Endowment
TripleBit: a fast and compact system for large scale RDF data

Proceedings of the VLDB Endowment
Efficient Multiview Maintenance under Insertion in Huge Social Networks

ACM Transactions on the Web (TWEB)
TripleProv: efficient processing of lineage queries in a native RDF store

Proceedings of the 23rd international conference on World wide web
Editorial: Efficient incremental update and querying in AWETO RDF storage system

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the proliferation of the RDF data format, engines for RDF query processing are faced with very large graphs that contain hundreds of millions of RDF triples. This paper addresses the resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current paper focuses on join processing, as the fine-grained and schema-relaxed use of RDF often entails star- and chain-shaped join queries with many input streams from index scans. We present two contributions for scalable join processing. First, we develop very light-weight methods for sideways information passing between separate joins at query run-time, to provide highly effective filters on the input streams of joins. Second, we improve previously proposed algorithms for join-order optimization by more accurate selectivity estimations for very large RDF graphs. Experimental studies with several RDF datasets, including the UniProt collection, demonstrate the performance gains of our approach, outperforming the previously fastest systems by more than an order of magnitude.