Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Using slice join for efficient evaluation of multi-way joins
Data & Knowledge Engineering
Scalable Semantics - The Silver Lining of Cloud Computing
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Scalable Distributed Reasoning Using MapReduce
ISWC '09 Proceedings of the 8th International Semantic Web Conference
RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web
ISWC '09 Proceedings of the 8th International Semantic Web Conference
The RDF-3X engine for scalable management of RDF data
The VLDB Journal — The International Journal on Very Large Data Bases
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
Towards scalable RDF graph analytics on MapReduce
Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HadoopDB in action: building real world applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data Intensive Query Processing for Large RDF Graphs Using Cloud Computing Tools
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Signal/collect: graph algorithms for the (semantic) web
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
RDFBroker: a signature-based high-performance RDF store
ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
OWL reasoning with WebPIE: calculating the closure of 100 billion triples
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
Efficiently joining group patterns in SPARQL queries
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I
Efficient processing of RDF graph pattern matching on MapReduce platforms
Proceedings of the second international workshop on Data intensive computing in the clouds
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Scalable processing of flexible graph pattern queries on the cloud
Proceedings of the 22nd international conference on World Wide Web companion
Optimizing RDF(S) queries on cloud platforms
Proceedings of the 22nd international conference on World Wide Web companion
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Existing MapReduce systems support relational style join operators which translate multi-join query plans into severalMap-Reduce cycles. This leads to high I/O and communication costs due to the multiple data transfer steps between map and reduce phases. SPARQL graph pattern matching is dominated by join operations, and is unlikely to be efficiently processed using existing techniques. This cost is prohibitive for RDF graph pattern matching queries which typically involve several join operations. In this paper, we propose an approach for optimizing graph pattern matching by reinterpreting certain join tree structures as grouping operations. This enables a greater degree of parallelism in join processing resulting in more "bushy" like query execution plans with fewer Map-Reduce cycles. This approach requires that the intermediate results are managed as sets of groups of triples or TripleGroups. We therefore propose a data model and algebra - Nested TripleGroup Algebra for capturing and manipulating TripleGroups. The relationship with the traditional relational style algebra used in Apache Pig is discussed. A comparative performance evaluation of the traditional Pig approach and RAPID+ (Pig extended with NTGA) for graph pattern matching queries on the BSBM benchmark dataset is presented. Results show up to 60% performance improvement of our approach over traditional Pig for some tasks.