Efficient query evaluation on distributed graphs with Hadoop environment

Authors:
Le-Duc Tung;Quyet Nguyen-Van;Zhenjiang Hu
Affiliations:
The Graduate University for Advanced Studies, Japan;Hung Yen University of Technology and Education, Vietnam;National Institute of Informatics, Japan
Venue:
Proceedings of the Fourth Symposium on Information and Communication Technology
Year:
2013

Citing 18
Cited 0

A bridging model for parallel computation

Communications of the ACM
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Semistructured data

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Distributed query evaluation on semistructured data

ACM Transactions on Database Systems (TODS)
Partial Evaluation of Computation Process—AnApproach to a Compiler-Compiler

Higher-Order and Symbolic Computation
UnQL: a query language and algebra for semistructured data based on structural recursion

The VLDB Journal — The International Journal on Very Large Data Bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Fault-tolerant computation of distributed regular path queries

Theoretical Computer Science
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HAMA: An Efficient Matrix Computation with the MapReduce Framework

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Signal/collect: graph algorithms for the (semantic) web

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Performance guarantees for distributed reachability queries

Proceedings of the VLDB Endowment
Ligra: a lightweight graph processing framework for shared memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph has emerged as a powerful data structure to describe various data. Query evaluation on distributed graphs takes much cost due to the complexity of links among sites. Dan Suciu has proposed algorithms for query evaluation on semistructured data that is a rooted, edge-labeled graph, and algorithms are proved to be efficient in terms of communication steps and data transferring during the evaluation. However, one disadvantage is that communication data are collected to one single site, which leads to a bottleneck in the evaluation for real-life data. In this paper, we propose two algorithms to improve Dan Suciu's algorithms: one-pass algorithm is to significantly reduce a large amount of redundant data in the evaluation, and iter_acc algorithm is to resolve the bottleneck. Then, we design an efficient implementation with only one MapReduce job for our algorithms in Hadoop environment by utilizing features of Hadoop file system. Experiments on cloud system show that one-pass algorithm can detect and remove 50% of data being redundant in the evaluation process on YouTube and DBLP datasets, and iter_acc algorithm is running without the bottleneck even when we double the size of input data.