HadoopRDF: a scalable semantic data analytical engine

Authors:
Jin-Hang Du;Hao-Fen Wang;Yuan Ni;Yong Yu
Affiliations:
Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China;IBM China Research Lab, China;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China
Venue:
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Year:
2012

Citing 7
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Towards scalable RDF graph analytics on MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
HadoopDB in action: building real world applications

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid growth of the scale of semantic data, to handle the problem of analyzing this large-scale data has become a hot topic. Traditional triple stores deployed on a single machine have been proved to be effective to provide storage and retrieval of RDF data. However, the scalability is limited and cannot handle billion ever growing triples. On the other hand, Hadoop is an open-source project which provides HDFS as a distributed file storage system and MapReduce as a computing framework for distributed processing. It has proved to perform well for large data analysis. In this paper, we propose, HadoopRDF, a system to combine both worlds (triple stores and Hadoop) to provide a scalable data analysis service for the RDF data. It benefits the scalability of Hadoop and the ability to support flexible analysis query like SPARQL of traditional triple stores. Experimental evaluation results show the effectiveness and efficiency of the approach.