A bridging model for parallel computation
Communications of the ACM
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Distributed query evaluation on semistructured data
ACM Transactions on Database Systems (TODS)
Partial Evaluation of Computation Process—AnApproach to a Compiler-Compiler
Higher-Order and Symbolic Computation
UnQL: a query language and algebra for semistructured data based on structural recursion
The VLDB Journal — The International Journal on Very Large Data Bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Fault-tolerant computation of distributed regular path queries
Theoretical Computer Science
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HAMA: An Efficient Matrix Computation with the MapReduce Framework
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Signal/collect: graph algorithms for the (semantic) web
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Performance guarantees for distributed reachability queries
Proceedings of the VLDB Endowment
Ligra: a lightweight graph processing framework for shared memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Graph has emerged as a powerful data structure to describe various data. Query evaluation on distributed graphs takes much cost due to the complexity of links among sites. Dan Suciu has proposed algorithms for query evaluation on semistructured data that is a rooted, edge-labeled graph, and algorithms are proved to be efficient in terms of communication steps and data transferring during the evaluation. However, one disadvantage is that communication data are collected to one single site, which leads to a bottleneck in the evaluation for real-life data. In this paper, we propose two algorithms to improve Dan Suciu's algorithms: one-pass algorithm is to significantly reduce a large amount of redundant data in the evaluation, and iter_acc algorithm is to resolve the bottleneck. Then, we design an efficient implementation with only one MapReduce job for our algorithms in Hadoop environment by utilizing features of Hadoop file system. Experiments on cloud system show that one-pass algorithm can detect and remove 50% of data being redundant in the evaluation process on YouTube and DBLP datasets, and iter_acc algorithm is running without the bottleneck even when we double the size of input data.