Information-complete and redundancy-free keyword search over large data graphs

Authors:
Byron J. Gao;Zhumin Chen;Qi Kang
Affiliations:
Texas State University, San Marcos, TX, USA;Shandong University, Jinan, China;Shandong University, Jinan, China
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 10
Cited 0

Retrieving and organizing web pages by “information unit”

Proceedings of the 10th international conference on World Wide Web
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Multiway SLCA-based keyword search in XML data

Proceedings of the 16th international conference on World Wide Web
BLINKS: ranked keyword searches on graphs

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Effective keyword search for valuable lcas over xml documents

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Keyword search on external memory data graphs

Proceedings of the VLDB Endowment
Querying Communities in Relational Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Keyword search over graphs has a wide array of applications in querying structured, semi-structured and unstructured data. Existing models typically use minimal trees or bounded subgraphs as query answers. While such models emphasize relevancy, they would suffer from incompleteness of information and redundancy among answers, making it difficult for users to effectively explore query answers. To overcome these drawbacks, we propose a novel cluster-based model, where query answers are relevancy-connected clusters. A cluster is a subgraph induced from a maximal set of relevancy-connected nodes. Such clusters are coherent and relevant, yet complete and redundancy free. They can be of arbitrary shape in contrast to the sphere-shaped bounded subgraphs in existing models. We also propose an efficient search algorithm and a corresponding graph index for large, disk-resident data graphs.