MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web
ISWC '09 Proceedings of the 8th International Semantic Web Conference
Towards scalable RDF graph analytics on MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
SPARQL basic graph pattern processing with iterative MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
HadoopDB in action: building real world applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
With the rapid growth of the scale of semantic data, to handle the problem of analyzing this large-scale data has become a hot topic. Traditional triple stores deployed on a single machine have been proved to be effective to provide storage and retrieval of RDF data. However, the scalability is limited and cannot handle billion ever growing triples. On the other hand, Hadoop is an open-source project which provides HDFS as a distributed file storage system and MapReduce as a computing framework for distributed processing. It has proved to perform well for large data analysis. In this paper, we propose, HadoopRDF, a system to combine both worlds (triple stores and Hadoop) to provide a scalable data analysis service for the RDF data. It benefits the scalability of Hadoop and the ability to support flexible analysis query like SPARQL of traditional triple stores. Experimental evaluation results show the effectiveness and efficiency of the approach.