Nested relations and complex objects in databases
Nested relations and complex objects in databases
IEEE Internet Computing
Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient processing of joins on set-valued attributes
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Scalable semantic web data management using vertical partitioning
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
The RDF-3X engine for scalable management of RDF data
The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Distributed cube materialization on holistic measures
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing
IEEE Transactions on Knowledge and Data Engineering
An intermediate algebra for optimizing RDF graph pattern matching on MapReduce
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Query optimization for massively parallel data processing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Scalable processing of flexible graph pattern queries on the cloud
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Many queries on RDF datasets involve triple patterns whose properties are multi-valued. When processing such queries using flat data models and their associated algebras, intermediate results could contain a lot of redundancy. In the context of processing using MapReduce based platforms such as Hadoop, such redundancy could account for a non-trivial proportion of overall disk I/O, sorting and network data transfer costs. Further, when MapReduce workflows consist of multiple cycles as is typical when processing RDF graph pattern queries, these costs could compound over multiple cycles. However, it may be possible to avoid such overhead if nested data models and algebras are used. In this short paper, we present some on-going research into the use of a nested TripleGroup data model and Algebra (NTGA) for MapReduce based RDF graph processing. The NTGA operators fully subscribe to the NTG data model. This is in contrast to systems such as Pig where the data model supports some nesting but the algebra is primarily tuple based (requiring the flattening of nested objects before other operators can be applied). This full subscription to the nested data model by NTGA also enables support for different unnesting strategies including delayed and partial unnesting. We present a preliminary evaluation of these strategies for efficient management of multi-valued properties while processing graph pattern queries in Apache Pig.