Scalable indexing of RDF graphs for efficient join processing

Authors:
George H.L. Fletcher;Peter W. Beck
Affiliations:
Eindhoven University of Technology, Eindhoven, Netherlands;Washington State University, Vancouver, WA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 10
Cited 15

Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Foundations of semantic web databases

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Indexing dataspaces

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Survey of graph database models

ACM Computing Surveys (CSUR)
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
SW-Store: a vertically partitioned DBMS for Semantic Web data management

The VLDB Journal — The International Journal on Very Large Data Bases
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

XML-based RDF data management for efficient query processing

Procceedings of the 13th International Workshop on the Web and Databases
What are real SPARQL queries like?

Proceedings of the International Workshop on Semantic Web Information Management
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
A Study of RDB-based RDF data management techniques

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Efficient association discovery with keyword-based constraints on large graph data

Proceedings of the 20th ACM international conference on Information and knowledge management
To cache or not to cache: the effects of warming cache in complex SPARQL queries

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Efficient RDFS entailment in external memory

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
Efficiently joining group patterns in SPARQL queries

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
Efficiency analysis in content based image retrieval using RDF annotations

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
A structural approach to indexing triples

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Evaluating graph traversal algorithms for distributed SPARQL query optimization

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
RDF pattern matching using sortable views

Proceedings of the 21st ACM international conference on Information and knowledge management
Tridex: A lightweight triple index for relational database-based Semantic Web data management

Expert Systems with Applications: An International Journal
Evaluation of RDF queries via equivalence

Frontiers of Computer Science: Selected Publications from Chinese Universities
Editorial: Efficient incremental update and querying in AWETO RDF storage system

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current approaches to RDF graph indexing suffer from weak data locality, i.e., information regarding a piece of data appears in multiple locations, spanning multiple data structures. Weak data locality negatively impacts storage and query processing costs. Towards stronger data locality, we propose a Three-way Triple Tree (TripleT) secondary memory indexing technique to facilitate flexible and efficient join evaluation on RDF data. The novelty of TripleT is that the index is built over the atoms occurring in the data set, rather than at a coarser granularity, such as whole triples occurring in the data set; and, the atoms are indexed regardless of the roles (i.e., subjects, predicates, or objects) they play in the triples of the data set. We show through extensive empirical evaluation that TripleT exhibits multiple orders of magnitude improvement over the state-of-the-art, in terms of both storage and query processing costs.