TripleBit: a fast and compact system for large scale RDF data

Authors:
Pingpeng Yuan;Pu Liu;Buwen Wu;Hai Jin;Wenya Zhang;Ling Liu
Affiliations:
Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Distributed Data Intensive Systems Lab., School of Computer Science, College of Computing, Georgia Institute of Technology
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 20
Cited 2

Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
Integrating Semi-Join-Reducers into State of the Art Query Processors

Proceedings of the 17th International Conference on Data Engineering
Storing RDF as a Graph

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
A path-based relational RDF database

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
Optimizing bitmap indices with efficient compression

ACM Transactions on Database Systems (TODS)
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
Efficiently querying rdf data in triple stores

Proceedings of the 17th international conference on World Wide Web
The SPARQL Query Graph Model for Query Optimization

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
YARS2: a federated repository for querying graph structured data from the web

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
x-RDF-3X: fast querying, high update rates, and consistency for RDF databases

Proceedings of the VLDB Endowment
gStore: answering SPARQL queries via subgraph matching

Proceedings of the VLDB Endowment
BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Delta-reasoner: a semantic web reasoner for an intelligent mobile platform

Proceedings of the 21st international conference companion on World Wide Web

Efficient data partitioning model for heterogeneous graphs in the cloud

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Editorial: Efficient incremental update and querying in AWETO RDF storage system

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to accelerate joins by orders of magnitude, the large space overhead limits the scalability of this approach and makes it heavyweight. In this paper, we present TripleBit, a fast and compact system for storing and accessing RDF data. The design of TripleBit has three salient features. First, the compact design of TripleBit reduces both the size of stored RDF data and the size of its indexes. Second, TripleBit introduces two auxiliary index structures, ID-Chunk bit matrix and ID-Predicate bit matrix, to minimize the cost of index selection during query evaluation. Third, its query processor dynamically generates an optimal execution ordering for join queries, leading to fast query execution and effective reduction on the size of intermediate results. Our experiments show that TripleBit outperforms RDF-3X, MonetDB, BitMat on LUBM, UniProt and BTC 2012 benchmark queries and it offers orders of mangnitude performance improvement for some complex join queries.