Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools

Authors:
Mohammad Farhan Husain;Latifur Khan;Murat Kantarcioglu;Bhavani Thuraisingham
Affiliations:
-;-;-;-
Venue:
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Year:
2010

Citing 0
Cited 8

PigSPARQL: mapping SPARQL to Pig Latin

Proceedings of the International Workshop on Semantic Web Information Management
High-performance computing applied to semantic databases

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce

ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Efficient processing of RDF graph pattern matching on MapReduce platforms

Proceedings of the second international workshop on Data intensive computing in the clouds
RDFPath: path query processing on large RDF graphs with mapreduce

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Efficient SPARQL query processing in mapreduce through data partitioning and indexing

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
RDF data management in the Amazon cloud

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Towards efficient join processing over large RDF graph using mapreduce

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Cloud computing is the newest paradigm in the IT world and hence the focus of new research. Companies hosting cloud computing services face the challenge of handling data intensive applications. Semantic web technologies can be an ideal candidate to be used together with cloud computing tools to provide a solution. These technologies have been standardized by the World Wide Web Consortium (W3C). One such standard is the Resource Description Framework (RDF). With the explosion of semantic web technologies, large RDF graphs are common place. Current frameworks do not scale for large RDF graphs. In this paper, we describe a framework that we built using Hadoop, a popular open source framework for Cloud Computing, to store and retrieve large numbers of RDF triples. We describe a scheme to store RDF data in Hadoop Distributed File System. We present an algorithm to generate the best possible query plan to answer a SPARQL Protocol and RDF Query Language (SPARQL) query based on a cost model. We use Hadoop's MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can easily handle billions of RDF triples, unlike traditional approaches.