Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Authors:
Medha Atre;Vineet Chaoji;Mohammed J. Zaki;James A. Hendler
Affiliations:
Rensselaer Polytechnic Institute, Troy, NY, USA;Yahoo! Labs, Bangalore, India;Rensselaer Polytechnic Institute, Troy, NY, USA;Rensselaer Polytechnic Institute, Troy, NY, USA
Venue:
Proceedings of the 19th international conference on World wide web
Year:
2010

Citing 13
Cited 27

Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Performance Measurements of Compressed Bitmap Indices

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
RDFPeers: a scalable distributed RDF repository based on a structured peer-to-peer network

Proceedings of the 13th international conference on World Wide Web
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
GRIN: a graph based RDF index

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
RDFCube: a P2P-based three-dimensional index for structural joins on distributed triple stores

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
BRAHMS: a workbench RDF store and high performance memory system for semantic association discovery

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Deep integration of spatial query processing into native RDF triple stores

Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
Invited paper: Scalable reduction of large datasets to interesting subsets

Web Semantics: Science, Services and Agents on the World Wide Web
Compact representation of large RDF data sets for publishing and exchange

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Efficient querying of distributed linked data

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Efficient query answering in probabilistic RDF graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Massive-scale RDF processing using compressed bitmap indexes

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
TripleCloud: An Infrastructure for Exploratory Querying over Web-Scale RDF Data

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
ANAPSID: an adaptive query processing engine for SPARQL endpoints

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
Lightweighting the web of data through compact RDF/HDT

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
To cache or not to cache: the effects of warming cache in complex SPARQL queries

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Linked data indexing methods: a survey

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
Efficiency analysis in content based image retrieval using RDF annotations

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Binary RDF for scalable publishing, exchanging and consumption in the web of data

Proceedings of the 21st international conference companion on World Wide Web
Database techniques for linked data management

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
SPOVC: a scalable RDF store using horizontal partitioning and column oriented DBMS

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
SPARQL query answering with bitmap indexes

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Heuristics-based query optimisation for SPARQL

Proceedings of the 15th International Conference on Extending Database Technology
Partitioned indexes for entity search over RDF knowledge bases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient subgraph matching on billion node graphs

Proceedings of the VLDB Endowment
Efficient graph management based on bitmap indices

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Scalable SAPRQL querying processing on large RDF data in cloud computing environment

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Binary RDF representation for publication and exchange (HDT)

Web Semantics: Science, Services and Agents on the World Wide Web
A distributed graph engine for web scale RDF data

Proceedings of the VLDB Endowment
TripleBit: a fast and compact system for large scale RDF data

Proceedings of the VLDB Endowment
k-nearest keyword search in RDF graphs

Web Semantics: Science, Services and Agents on the World Wide Web
Editorial: Efficient incremental update and querying in AWETO RDF storage system

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Semantic Web community, until now, has used traditional database systems for the storage and querying of RDF data. The SPARQL query language also closely follows SQL syntax. As a natural consequence, most of the SPARQL query processing techniques are based on database query processing and optimization techniques. For SPARQL join query optimization, previous works like RDF-3X and Hexastore have proposed to use 6-way indexes on the RDF data. Although these indexes speed up merge-joins by orders of magnitude, for complex join queries generating large intermediate join results, the scalability of the query processor still remains a challenge. In this paper, we introduce (i) BitMat - a compressed bit-matrix structure for storing huge RDF graphs, and (ii) a novel, light-weight SPARQL join query processing method that employs an initial pruning technique, followed by a variable-binding-matching algorithm on BitMats to produce the final results. Our query processing method does not build intermediate join tables and works directly on the compressed data. We have demonstrated our method against RDF graphs of upto 1.33 billion triples - the largest among results published until now (single-node, non-parallel systems), and have compared our method with the state-of-the-art RDF stores - RDF-3X and MonetDB. Our results show that the competing methods are most effective with highly selective queries. On the other hand, BitMat can deliver 2-3 orders of magnitude better performance on complex, low-selectivity queries over massive data.