Querying RDF dictionaries in compressed space

Authors:
Miguel A. Martínez-Prieto;Javier D. Fernández;Rodrigo Cánovas
Affiliations:
Univ. of Valladolid, Spain and Univ. of Chile, Chile;Univ. of Valladolid, Spain and Univ. of Chile, Chile;Univ. of Melbourne, Australia and Univ. of Chile, Chile
Venue:
ACM SIGAPP Applied Computing Review
Year:
2012

Citing 17
Cited 1

Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Introduction to algorithms

Introduction to algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Cache-oblivious string B-trees

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
The RDF-3X engine for scalable management of RDF data

The VLDB Journal — The International Journal on Very Large Data Bases
Organization and maintenance of large ordered indices

SIGFIDET '70 Proceedings of the 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control
The compressed permuterm index

ACM Transactions on Algorithms (TALG)
Compact representation of large RDF data sets for publishing and exchange

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Compressed string dictionaries

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Data Management and Query Processing in Semantic Web Databases

Data Management and Query Processing in Semantic Web Databases
Learning SPARQL

Learning SPARQL
Indexing Sequences of IEEE 754 Double Precision Numbers

DCC '12 Proceedings of the 2012 Data Compression Conference
Exchange and consumption of huge RDF data

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications

Binary RDF representation for publication and exchange (HDT)

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of dictionaries is a common practice among those applications performing on huge RDF datasets. It allows long terms occurring in the RDF triples to be replaced by short IDs which reference them. This decision greatly compacts the dataset and mitigates the scalability issues underlying to its management. However, the dictionary size is not negligible and the techniques used for its representation also suffer from scalability limitations. This paper focuses on this scenario by adapting compression techniques for string dictionaries to the case of RDF. We propose a novel technique: Dcomp, which can be tuned to represent the dictionary in compressed space (22--64%) and to perform basic lookup operations in a few microseconds (1--50μs). In addition, we propose Dcomp as a basis for specific SPARQL query optimizations leveraging its ability for early FILTER resolution.