A distributed index for efficient parallel top-k keyword search on massive graphs

Authors:
Ming Zhong;Mengchi Liu
Affiliations:
Wuhan University, Wuhan, China;Carleton University, Ottawa, Canada
Venue:
Proceedings of the twelfth international workshop on Web information and data management
Year:
2012

Citing 18
Cited 0

Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Retrieving and organizing web pages by “information unit”

Proceedings of the 10th international conference on World Wide Web
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding and approximating top-k answers in keyword proximity search

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword proximity search in complex data graphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Challenges in building large-scale information retrieval systems: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Reachability Indexes for Relational Keyword Search

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient keyword proximity search using a frontier-reduce strategy based on d-distance graph index

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
Efficient subgraph matching on billion node graphs

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, a variety of indexing techniques have been proposed for optimizing keyword search on graph. However, graph indexing has very high space and time complexities, and thus these single-machine in-memory indices are usually not affordable for massive graphs. In this paper, we propose a novel distributed disk-based index, which organizes the local topology information in the graph to track and prune matched vertices that will not participate in the top-k answers to a specified query before search with heuristics. The distributed index can be constructed in a MapReduce manner. Moreover, a parallel search algorithm is also developed. It runs multiple asynchronous search instances that incrementally enumerate the current best local answers and then produces the global top-k answers from them. Lastly, we perform experiments on both synthetic and real graphs with various configurations. The results show that our approach can improve search efficiency on massive graphs significantly with affordable indexing overheads.