Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing

Authors:
Mohammad Husain;James McGlothlin;Mohammad M. Masud;Latifur Khan;Bhavani M. Thuraisingham
Affiliations:
University of Texas at Dallas, Richardson;University of Texas at Dallas, Richardson;University of Texas at Dallas, Richardson;University of Texas at Dallas, Richardson;University of Texas at Dallas, Richardson
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2011

Citing 0
Cited 14

Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Proceedings of the fourth international workshop on Data-intensive distributed computing
Scalable queries for large datasets using cloud computing: a case study

Proceedings of the 15th Symposium on International Database Engineering & Applications
H2RDF: adaptive query processing on RDF data in the cloud.

Proceedings of the 21st international conference companion on World Wide Web
To nest or not to nest, when and how much: representing intermediate results of graph pattern queries in MapReduce based processing

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Heuristics-based query optimisation for SPARQL

Proceedings of the 15th International Conference on Extending Database Technology
Cloud-Centric assured information sharing

PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
AMADA: web data repositories in the amazon cloud

Proceedings of the 21st ACM international conference on Information and knowledge management
Scalable SAPRQL querying processing on large RDF data in cloud computing environment

ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Scalable RDF graph querying using cloud computing

Journal of Web Engineering
A distributed graph engine for web scale RDF data

Proceedings of the VLDB Endowment
Efficient data partitioning model for heterogeneous graphs in the cloud

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Scaling queries over big RDF graphs with semantic hash partitioning

Proceedings of the VLDB Endowment
RDF analytics: lenses over semantic graphs

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic web is an emerging area to augment human reasoning. Various technologies are being developed in this arena which have been standardized by the World Wide Web Consortium (W3C). One such standard is the Resource Description Framework (RDF). Semantic web technologies can be utilized to build efficient and scalable systems for Cloud Computing. With the explosion of semantic web technologies, large RDF graphs are common place. This poses significant challenges for the storage and retrieval of RDF graphs. Current frameworks do not scale for large RDF graphs and as a result do not address these challenges. In this paper, we describe a framework that we built using Hadoop to store and retrieve large numbers of RDF triples by exploiting the cloud computing paradigm. We describe a scheme to store RDF data in Hadoop Distributed File System. More than one Hadoop job (the smallest unit of execution in Hadoop) may be needed to answer a query because a single triple pattern in a query cannot simultaneously take part in more than one join in a single Hadoop job. To determine the jobs, we present an algorithm to generate query plan, whose worst case cost is bounded, based on a greedy approach to answer a SPARQL Protocol and RDF Query Language (SPARQL) query. We use Hadoop's MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can handle large amounts of RDF data, unlike traditional approaches.