Implementation of magic-sets in a relational database system
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scalable semantic web data management using vertical partitioning
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hexastore: sextuple indexing for semantic web data management
Proceedings of the VLDB Endowment
Scalable join processing on very large RDF graphs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
The RDF-3X engine for scalable management of RDF data
The VLDB Journal — The International Journal on Very Large Data Bases
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Programming Support Innovations for Emerging Distributed Applications
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Scalable SAPRQL querying processing on large RDF data in cloud computing environment
ICPCA/SWS'12 Proceedings of the 2012 international conference on Pervasive Computing and the Networked World
Hi-index | 0.00 |
Broadened adoption of the Linking Open Data tenets has led to a significant surge in the amount of Semantic Web data, particularly RDF data. This has positioned the issue of scalable data processing techniques for RDF as a central issue in the Semantic Web research community. The RDF data model is a fine-grained model representing relationships as binary relations. Thus, answering queries (typically graph pattern matching queries) over RDF data requires several join operations to reassemble related data. While MapReduce based processing is emerging as the de facto paradigm for processing large scale data, it is known to be inefficient for join-intensive workloads. In addition, most of the existing techniques for optimizing RDF data processing do not transfer well to the MapReduce model and often require significant lead time for pre-processing. Such a requirement may not be desirable for on-demand cloud database scenarios where the goal is to reduce the Time-To-Result (TTR). In this position paper, we argue that some of these challenges can be overcome by rethinking the operators for graph pattern processing, as well as adopting dynamic optimization techniques that exploit information from the previous execution steps to eliminate intermediate results that are irrelevant in the context of future execution steps. We present some preliminary evaluation results of the proposed techniques.