Summarization graph indexing: beyond frequent structure-based approach

Authors:
Lei Zou;Lei Chen;Huaming Zhang;Yansheng Lu;Qiang Lou
Affiliations:
Huazhong University of Science and Technology, Wuhan, China;Hong Kong of Science and Technology, Hong Kong, China;The University of Alabama in Huntsville, Huntsville, AL;Hong Kong of Science and Technology, Hong Kong, China;The Temple University
Venue:
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Year:
2008

Citing 13
Cited 5

Introduction to algorithms

Introduction to algorithms
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Signature-based structures for objects with set-valued attributes

Information Systems - Databases: Creation, management and utilization
Similarity Searching in Medical Image Databases

IEEE Transactions on Knowledge and Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
FIX: feature-based indexing technique for XML documents

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Correlation search in graph databases

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Community mining from multi-relational networks

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism

Journal of Experimental Algorithmics (JEA)
Efficient and accurate retrieval of business process models through indexing

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
Information-geometric graph indexing from bags of partial node coverages

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Efficient querying of large process model repositories

Computers in Industry
Querying business process model repositories

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph is an important data structure to model complex structural data, such as chemical compounds, proteins, and XML documents. Among many graph data-based applications, sub-graph search is a key problem, which is defined as given a query Q, retrieving all graphs containing Q as a sub-graph in the graph database. Most existing sub-graph search methods try to filter out false positives (graphs that are not possible in the results) as many as possible by indexing some frequent sub-structures in graph database, such as [20, 22, 4, 23]. However, due to ignoring the relationships between sub-structures, these methods still admit a high percentage of false positives. In this paper, we propose a novel concept, Summarization Graph, which is a complete graph and captures most topology information of the original graph, such as sub-structures and their relationships. Based on Summarization Graphs, we convert the filtering problem into retrieving objects with set-valued attributes. Moreover, we build an efficient signature file-based index to improve the filtering process. We prove theoretically that the pruning power of our method is larger than existing structure-based approaches. Finally, we show by extensive experimental study on real and synthetic data sets that the size of candidate set generated by Summarization Graph-based approach is only about 50% of that left by existing graph indexing methods, and the total response time of our method is reduced 2-10 times.