PigSPARQL: mapping SPARQL to Pig Latin

Authors:
Alexander Schätzle;Martin Przyjaciel-Zablocki;Georg Lausen
Affiliations:
University of Freiburg, Germany;University of Freiburg, Germany;University of Freiburg, Germany
Venue:
Proceedings of the International Workshop on Semantic Web Information Management
Year:
2011

Citing 20
Cited 6

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
SPARQL basic graph pattern optimization using selectivity estimation

Proceedings of the 17th international conference on World Wide Web
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The SPARQL Query Graph Model for Query Optimization

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Web Semantics in the Clouds

IEEE Intelligent Systems
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Semantics and complexity of SPARQL

ACM Transactions on Database Systems (TODS)
SPIDER: a system for scalable, parallel / distributed evaluation of large-scale RDF data

Proceedings of the 18th ACM conference on Information and knowledge management
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Towards scalable RDF graph analytics on MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
SPARQL basic graph pattern processing with iterative MapReduce

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Foundations of SPARQL query optimization

Proceedings of the 13th International Conference on Database Theory
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools

CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce

RDFPath: path query processing on large RDF graphs with mapreduce

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
RDF data management in the Amazon cloud

Proceedings of the 2012 Joint EDBT/ICDT Workshops
Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Toward a data scalable solution for facilitating discovery of scientific data resources

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Semantic-based QoS management in cloud systems: Current status and future challenges

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.