Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable semantic web data management using vertical partitioning
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hexastore: sextuple indexing for semantic web data management
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Scalable Semantics - The Silver Lining of Cloud Computing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scalable join processing on very large RDF graphs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
PigSPARQL: mapping SPARQL to Pig Latin
Proceedings of the International Workshop on Semantic Web Information Management
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
TripleCloud: An Infrastructure for Exploratory Querying over Web-Scale RDF Data
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
HadoopRDF: a scalable semantic data analytical engine
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Hi-index | 0.00 |
In order to exploit the growing amount of RDF data in decision-making, there is an increasing demand for analytics-style processing of such data. RDF data is modeled as a labeled graph that represents a collection of binary relations (triples). In this context, analytical queries can be interpreted as consisting of three main constructs namely pattern matching, grouping and aggregation, and require several join operations to reassemble them into n-ary relations relevant to the given query, unlike traditional OLAP systems where data is suitably organized. MapReduce-based parallel processing systems like Pig have gained success in processing scalable analytical workloads. However, these systems offer only relational algebra style operators which would require an iterative n-tuple reassembly process in which intermediate results need to be materialized. This leads to high I/O costs that negatively impacts performance. In this paper, we propose UDFs that (i) re-factor analytical processing on RDF graphs in a way that enables more parallelized processing (ii) perform a look-ahead processing to reduce the cost of subsequent operators in the query execution plan. These functions have been integrated into the Pig Latin function library and the experimental results show up to 50% improvement in execution times for certain classes of queries. An important impact of this work is that it could serve as the foundation for additional physical operators in systems such as Pig for more efficient graph processing.