Binary RDF representation for publication and exchange (HDT)

Authors:
Javier D. FernáNdez;Miguel A. MartíNez-Prieto;Claudio GutiéRrez;Axel Polleres;Mario Arias
Affiliations:
DataWeb Research, Department of Computer Science, University of Valladolid, E.T.S. de Ingeniería Informática, Campus Miguel Delibes, 47011 Valladolid, Spain;DataWeb Research, Department of Computer Science, University of Valladolid, E.T.S. de Ingeniería Informática, Campus Miguel Delibes, 47011 Valladolid, Spain and Department of Computer Sc ...;Department of Computer Science, University of Chile, Avenida Blanco Encalada 2120, 837-0459 Santiago, Chile;Digital Enterprise Research Institute, National University of Ireland, Galway, IDA Business Park, Lower Dangan, Galway, Ireland and Siemens AG Österreich, Siemensstrasse 90, 1210 Vienna, Aust ...;DataWeb Research, Department of Computer Science, University of Valladolid, E.T.S. de Ingeniería Informática, Campus Miguel Delibes, 47011 Valladolid, Spain and Siemens AG Österreic ...
Venue:
Web Semantics: Science, Services and Agents on the World Wide Web
Year:
2013

Citing 25
Cited 1

Compact pat trees

Compact pat trees
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient SQL-based RDF querying scheme

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Linked data on the web (LDOW2008)

Proceedings of the 17th international conference on World Wide Web
On Graph Features of Semantic Web Schemas

IEEE Transactions on Knowledge and Data Engineering
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Column-store support for RDF data management: not all swans are white

Proceedings of the VLDB Endowment
Sindice.com: a document-oriented lookup index for open linked data

International Journal of Metadata, Semantics and Ontologies
An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Compressed web indexes

Proceedings of the 18th international conference on World wide web
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Semantics and complexity of SPARQL

ACM Transactions on Database Systems (TODS)
k2-Trees for Compact Web Graph Representation

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
RDFStats - An Extensible RDF Statistics Generator and Library

DEXA '09 Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
RDF compression: basic approaches

Proceedings of the 19th international conference on World wide web
Semantic sitemaps: efficient and flexible access to datasets on the semantic web

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Massive Semantic Web data compression with MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Foundations of Semantic Web databases

Journal of Computer and System Sciences
Characterizing the semantic web on the web

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora

Web Semantics: Science, Services and Agents on the World Wide Web
Querying RDF dictionaries in compressed space

ACM SIGAPP Applied Computing Review

Towards an architecture for managing big semantic data in real-time

ECSA'13 Proceedings of the 7th European conference on Software Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The current Web of Data is producing increasingly large RDF datasets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large datasets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these datasets. This scenario calls for efficient formats for publication and exchange. This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF datasets: Header information, a Dictionary, and the actual Triples structure (thus called HDT). Our experimental evaluation shows that datasets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent publication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.