Minimizing data transfers for regular reachability queries on distributed graphs

Authors:
Quyet Nguyen-Van;Le-Duc Tung;Zhenjiang Hu
Affiliations:
Hung Yen University of Technology and Education, Khoai Chau, Hung Yen;The Graduate University for Advanced Studies, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the Fourth Symposium on Information and Communication Technology
Year:
2013

Citing 20
Cited 0

Regular expressions into finite automata

Theoretical Computer Science
An introduction to partial evaluation

ACM Computing Surveys (CSUR)
Distributed query evaluation on semistructured data

ACM Transactions on Database Systems (TODS)
Reachability and Distance Queries via 2-Hop Labels

SIAM Journal on Computing
Using partial evaluation in distributed query evaluation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Mixed mode XML query processing

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Fast computing reachability labelings for large graphs with high compression rate

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Efficiently answering reachability queries on very large directed graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Social Network Extraction of Academic Researchers

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Fault-tolerant computation of distributed regular path queries

Theoretical Computer Science
On social networks and collaborative recommendation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Computing label-constraint reachability in graph databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
GRAIL: scalable reachability index for large graphs

Proceedings of the VLDB Endowment
Patterns of temporal variation in online media

Proceedings of the fourth ACM international conference on Web search and data mining
Adding regular expressions to graph reachability and pattern queries

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Fast computation of reachability labeling for large graphs

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Defining and evaluating network communities based on ground-truth

Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Performance guarantees for distributed reachability queries

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, there is an explosion of Internet information, which is normally distributed on different sites. Hence, efficient finding information becomes difficult. Efficient query evaluation on distributed graphs is an important research topic since it can be used in real applications such as: social network analysis, web mining, ontology matching, etc. A widely-used query on distributed graphs is the regular reachability query (RRQ). A RRQ verifies whether a node can reach another node by a path satisfying a regular expression. Traditionally RRQs are evaluated by distributed depth-first search or distributed breadth-first search methods. However, these methods are restricted by the total network traffic and the response time on large graphs. Recently, Wenfei Fan et al. proposed an approach for improving reachability queries by visiting each site only once, but it has a communication bottleneck problem when assembling all distributed partial query results. In this paper, we propose two algorithms in order to improve Wenfei Fan's algorithm for RRQs. The first algorithm filters and removes redundant nodes/edges on each local site, in parallel. The second algorithm limits the data transfers by local contraction of the partial result. We extensively evaluated our algorithms on MapReduce using YouTube and DBLP datasets. The experimental results show that our method reduces unnecessary data transfers at most 60%, this solves the communication bottleneck problem.