Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Authors:
Mohammad Farhan Husain;Pankil Doshi;Latifur Khan;Bhavani Thuraisingham
Affiliations:
University of Texas at Dallas, Dallas, USA 75080;University of Texas at Dallas, Dallas, USA 75080;University of Texas at Dallas, Dallas, USA 75080;University of Texas at Dallas, Dallas, USA 75080
Venue:
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Year:
2009

Citing 3
Cited 5

Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Scaling up Classifiers to Cloud Computers

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web

WebPIE: A Web-scale Parallel Inference Engine using MapReduce

Web Semantics: Science, Services and Agents on the World Wide Web
Performance guarantees for distributed reachability queries

Proceedings of the VLDB Endowment
Towards efficient join processing over large RDF graph using mapreduce

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
A hierarchical back-end architecture for smartphone sensing

Proceedings of the 2012 ACM Research in Applied Computation Symposium
Semantic-based QoS management in cloud systems: Current status and future challenges

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Handling huge amount of data scalably is a matter of concern for a long time. Same is true for semantic web data. Current semantic web frameworks lack this ability. In this paper, we describe a framework that we built using Hadoop to store and retrieve large number of RDF triples. We describe our schema to store RDF data in Hadoop Distribute File System. We also present our algorithms to answer a SPARQL query. We make use of Hadoop's MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built mostly by cheap commodity class hardware and still can answer queries fast enough. We conclude that ours is a scalable framework, able to handle large amount of RDF data efficiently.