HadoopRDF: a scalable semantic data analytical engine

  • Authors:
  • Jin-Hang Du;Hao-Fen Wang;Yuan Ni;Yong Yu

  • Affiliations:
  • Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China;IBM China Research Lab, China;Apex Data and Knowledge Management Lab, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid growth of the scale of semantic data, to handle the problem of analyzing this large-scale data has become a hot topic. Traditional triple stores deployed on a single machine have been proved to be effective to provide storage and retrieval of RDF data. However, the scalability is limited and cannot handle billion ever growing triples. On the other hand, Hadoop is an open-source project which provides HDFS as a distributed file storage system and MapReduce as a computing framework for distributed processing. It has proved to perform well for large data analysis. In this paper, we propose, HadoopRDF, a system to combine both worlds (triple stores and Hadoop) to provide a scalable data analysis service for the RDF data. It benefits the scalability of Hadoop and the ability to support flexible analysis query like SPARQL of traditional triple stores. Experimental evaluation results show the effectiveness and efficiency of the approach.