SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data

Authors:
Hyunsik Choi;Jihoon Son;YongHyun Cho;Min Kyoung Sung;Yon Dohn Chung
Affiliations:
College of Information and Communication, Korea University, Seoul, South Korea;College of Information and Communication, Korea University, Seoul, South Korea;College of Information and Communication, Korea University, Seoul, South Korea;College of Information and Communication, Korea University, Seoul, South Korea;College of Information and Communication, Korea University, Seoul, South Korea
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 3
Cited 6

C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation

SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
PigSPARQL: mapping SPARQL to Pig Latin

Proceedings of the International Workshop on Semantic Web Information Management
Efficient SPARQL query processing in mapreduce through data partitioning and indexing

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Cloud-Centric assured information sharing

PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Multimedia Applications and Security in MapReduce: Opportunities and Challenges

Concurrency and Computation: Practice & Experience
Scalable RDF graph querying using cloud computing

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributed computing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. The loader analyzes and dissects the RDF data and places parts of data over multiple servers. The query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. This prototype will be foundation to develop real applications with large-scale RDF graph data.