TripleBit: a fast and compact system for large scale RDF data

  • Authors:
  • Pingpeng Yuan;Pu Liu;Buwen Wu;Hai Jin;Wenya Zhang;Ling Liu

  • Affiliations:
  • Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Services Computing Tech. and System Lab., School of Computer Science & Technology, Huazhong University of Science and Technology, China;Distributed Data Intensive Systems Lab., School of Computer Science, College of Computing, Georgia Institute of Technology

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The volume of RDF data continues to grow over the past decade and many known RDF datasets have billions of triples. A grant challenge of managing this huge RDF data is how to access this big RDF data efficiently. A popular approach to addressing the problem is to build a full set of permutations of (S, P, O) indexes. Although this approach has shown to accelerate joins by orders of magnitude, the large space overhead limits the scalability of this approach and makes it heavyweight. In this paper, we present TripleBit, a fast and compact system for storing and accessing RDF data. The design of TripleBit has three salient features. First, the compact design of TripleBit reduces both the size of stored RDF data and the size of its indexes. Second, TripleBit introduces two auxiliary index structures, ID-Chunk bit matrix and ID-Predicate bit matrix, to minimize the cost of index selection during query evaluation. Third, its query processor dynamically generates an optimal execution ordering for join queries, leading to fast query execution and effective reduction on the size of intermediate results. Our experiments show that TripleBit outperforms RDF-3X, MonetDB, BitMat on LUBM, UniProt and BTC 2012 benchmark queries and it offers orders of mangnitude performance improvement for some complex join queries.