TripleCloud: An Infrastructure for Exploratory Querying over Web-Scale RDF Data

Authors:
Christophe Gueret;Spyros Kotoulas;Paul Groth
Affiliations:
-;-;-
Venue:
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2011

Citing 11
Cited 2

Introduction to Evolutionary Computing

Introduction to Evolutionary Computing
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hexastore: sextuple indexing for semantic web data management

Proceedings of the VLDB Endowment
Anytime Query Answering in RDF through Evolutionary Algorithms

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Scalable Semantics - The Silver Lining of Cloud Computing

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Semplore: A scalable IR approach to search the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Matrix "Bit" loaded: a scalable lightweight join query processor for RDF data

Proceedings of the 19th international conference on World wide web
Towards scalable RDF graph analytics on MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Combining query translation with query answering for efficient keyword search

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II

The 2012 international workshop on web-scale knowledge representation, retrieval, and reasoning

Proceedings of the 21st ACM international conference on Information and knowledge management
Linked open GeoData management in the cloud

Proceedings of the 2nd International Workshop on Open Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the availability of large scale RDF data sets has grown, there has been a corresponding growth in researchers' and practitioners' interest in analyzing and investigating these data sets. However, given their size and messiness, there is significant overhead in setting up the infrastructure to store and query them. In this paper, we present Triple Cloud, a system that aims to lower the entry cost to exploring Web-scale RDF data sets. The system takes advantage of existing cloud based key-value stores (e.g.BigTable, HBase) to both enable scalability as well as hide the complexities of infrastructure deployment and maintenance. It layers over these key-value stores a robust query engine able to return approximate answers. We test the scalability of the approach scaling to over 3 billion triples for complex queries. In addition to an implementation over HBase, Triple Cloud runs over the Google App Engine, allowing us to perform a cost evaluation of the approach.