Scalable RDF data compression with MapReduce

Authors:
Jacopo Urbani;Jason Maassen;Niels Drost;Frank Seinstra;Henri Bal
Affiliations:
Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands;Dept. of Computer Science, VU University, AmsterdamThe Netherlands
Venue:
Concurrency and Computation: Practice & Experience
Year:
2013

Citing 17
Cited 1

Burst tries: a fast, efficient data structure for string keys

ACM Transactions on Information Systems (TOIS)
In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Parallel algorithms for the static dictionary compression

DCC '95 Proceedings of the Conference on Data Compression
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Web document compaction by compressing URI references in RDF and OWL data

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
Scalable Distributed Reasoning Using MapReduce

ISWC '09 Proceedings of the 8th International Semantic Web Conference
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
RDF compression: basic approaches

Proceedings of the 19th international conference on World wide web
Massive Semantic Web data compression with MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
OWLIM – a pragmatic semantic repository for OWL

WISE'05 Proceedings of the 2005 international conference on Web Information Systems Engineering
OWL reasoning with WebPIE: calculating the closure of 100 billion triples

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
Dictionary design for text image compression with JBIG2

IEEE Transactions on Image Processing

Special Issue: MapReduce and its Applications

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Semantic Web contains many billions of statements, which are released using the resource description framework (RDF) data model. To better handle these large amounts of data, high performance RDF applications must apply a compression technique. Unfortunately, because of the large input size, even this compression is challenging. In this paper, we propose a set of distributed MapReduce algorithms to efficiently compress and decompress a large amount of RDF data. Our approach uses a dictionary encoding technique that maintains the structure of the data. We highlight the problems of distributed data compression and describe the solutions that we propose. We have implemented a prototype using the Hadoop framework, and evaluate its performance. We show that our approach is able to efficiently compress a large amount of data and scales linearly on both input size and number of nodes. Copyright © 2012 John Wiley & Sons, Ltd.