Rya: a scalable RDF triple store for the clouds

Authors:
Roshan Punnoose;Adina Crainiceanu;David Rapp
Affiliations:
Proteus Technologies;US Naval Academy;Laboratory for Telecommunication Sciences
Venue:
Proceedings of the 1st International Workshop on Cloud Intelligence
Year:
2012

Citing 7
Cited 2

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
High-performance, massively scalable distributed systems using the MapReduce software framework: the SHARD triple-store

Programming Support Innovations for Emerging Distributed Applications
YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Proceedings of the 2nd ACM Symposium on Cloud Computing

Social infobuttons: integrating open health data with social data using semantic technology

Proceedings of the Fifth Workshop on Semantic Web Information Management
Bloofi: a hierarchical Bloom filter index with applications to distributed data provenance

Proceedings of the 2nd International Workshop on Cloud Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resource Description Framework (RDF) was designed with the initial goal of developing metadata for the Internet. While the Internet is a conglomeration of many interconnected networks and computers, most of today's best RDF storage solutions are confined to a single node. Working on a single node has significant scalability issues, especially considering the magnitude of modern day data. In this paper we introduce a scalable RDF data management system that uses Accumulo, a Google Bigtable variant. We introduce storage methods, indexing schemes, and query processing techniques that scale to billions of triples across multiple nodes, while providing fast and easy access to the data through conventional query mechanisms such as SPARQL. Our performance evaluation shows that in most cases, our system outperforms existing distributed RDF solutions, even systems much more complex than ours.