PigSPARQL: mapping SPARQL to Pig Latin
Proceedings of the International Workshop on Semantic Web Information Management
High-performance computing applied to semantic databases
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Efficient processing of RDF graph pattern matching on MapReduce platforms
Proceedings of the second international workshop on Data intensive computing in the clouds
RDFPath: path query processing on large RDF graphs with mapreduce
ESWC'11 Proceedings of the 8th international conference on The Semantic Web
Efficient SPARQL query processing in mapreduce through data partitioning and indexing
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
RDF data management in the Amazon cloud
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Towards efficient join processing over large RDF graph using mapreduce
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.01 |
Cloud computing is the newest paradigm in the IT world and hence the focus of new research. Companies hosting cloud computing services face the challenge of handling data intensive applications. Semantic web technologies can be an ideal candidate to be used together with cloud computing tools to provide a solution. These technologies have been standardized by the World Wide Web Consortium (W3C). One such standard is the Resource Description Framework (RDF). With the explosion of semantic web technologies, large RDF graphs are common place. Current frameworks do not scale for large RDF graphs. In this paper, we describe a framework that we built using Hadoop, a popular open source framework for Cloud Computing, to store and retrieve large numbers of RDF triples. We describe a scheme to store RDF data in Hadoop Distributed File System. We present an algorithm to generate the best possible query plan to answer a SPARQL Protocol and RDF Query Language (SPARQL) query based on a cost model. We use Hadoop's MapReduce framework to answer the queries. Our results show that we can store large RDF graphs in Hadoop clusters built with cheap commodity class hardware. Furthermore, we show that our framework is scalable and efficient and can easily handle billions of RDF triples, unlike traditional approaches.