Combining fuzzy information from multiple systems (extended abstract)
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Retrieving and organizing web pages by “information unit”
Proceedings of the 10th international conference on World Wide Web
Minimal probing: supporting expensive predicates for top-k queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Bidirectional expansion for keyword search on graph databases
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding and approximating top-k answers in keyword proximity search
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BLINKS: ranked keyword searches on graphs
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword proximity search in complex data graphs
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Challenges in building large-scale information retrieval systems: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Reachability Indexes for Relational Keyword Search
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient keyword proximity search using a frontier-reduce strategy based on d-distance graph index
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data-Intensive Text Processing with MapReduce
Data-Intensive Text Processing with MapReduce
SAPPER: subgraph indexing and approximate matching in large graphs
Proceedings of the VLDB Endowment
Efficient subgraph matching on billion node graphs
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Recently, a variety of indexing techniques have been proposed for optimizing keyword search on graph. However, graph indexing has very high space and time complexities, and thus these single-machine in-memory indices are usually not affordable for massive graphs. In this paper, we propose a novel distributed disk-based index, which organizes the local topology information in the graph to track and prune matched vertices that will not participate in the top-k answers to a specified query before search with heuristics. The distributed index can be constructed in a MapReduce manner. Moreover, a parallel search algorithm is also developed. It runs multiple asynchronous search instances that incrementally enumerate the current best local answers and then produces the global top-k answers from them. Lastly, we perform experiments on both synthetic and real graphs with various configurations. The results show that our approach can improve search efficiency on massive graphs significantly with affordable indexing overheads.