Scalable RDF data compression with MapReduce

  • Authors:
  • Jacopo Urbani;Jason Maassen;Niels Drost;Frank Seinstra;Henri Bal

  • Affiliations:
  • Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright © 2012 John Wiley & Sons, Ltd.