Graph indexing: a frequent structure-based approach

Authors:
Xifeng Yan;Philip S. Yu;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign;IBM T. J. Watson Research Center;University of Illinois at Urbana-Champaign
Venue:
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Year:
2004

Citing 17
Cited 133

Efficient Matching and Indexing of Graph Models in Content-Based Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
APEX: an adaptive path index for XML data

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Geometric Hashing

IEEE Computational Science & Engineering
Similarity Searching in Medical Image Databases

IEEE Transactions on Knowledge and Data Engineering
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Fast Index for Semistructured Data

Proceedings of the 27th International Conference on Very Large Data Bases
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computing Frequent Graph Patterns from Semistructured Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
D(k)-index: an adaptive structural summary for graph-structured data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Scalable mining of large disk-based graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Computing for biologists: lessons from some successful case studies

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining closed relational graphs with connectivity constraints

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining Chains of Relations

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Counting triangles in data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
FIX: feature-based indexing technique for XML documents

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Feature-based similarity search in graph structures

ACM Transactions on Database Systems (TODS)
Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
Extraction and search of chemical formulae in text documents on the web

Proceedings of the 16th international conference on World Wide Web
Fast best-effort pattern matching in large attributed graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-k subgraph matching query in a large graph

Proceedings of the ACM first Ph.D. workshop in CIKM
Towards graph containment search and indexing

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Scaling RDF with Time

Proceedings of the 17th international conference on World Wide Web
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
Periscope/GQ: a graph querying toolkit

Proceedings of the VLDB Endowment
On effective presentation of graph patterns: a structural representative approach

Proceedings of the 17th ACM conference on Information and knowledge management
Mining frequent cross-graph quasi-cliques

ACM Transactions on Knowledge Discovery from Data (TKDD)
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A novel approach for efficient supergraph query processing on graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
G-hash: towards fast kernel-based similarity search in large graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
FOGGER: an algorithm for graph generator discovery

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
GraphREL: A Decomposition-Based and Selectivity-Aware Relational Framework for Processing Sub-graph Queries

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Top-K Correlation Sub-graph Search in Graph Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
CP-summary: a concise representation for browsing frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
Reasoning about designs through frequent patterns mining

Advanced Engineering Informatics
Independent informative subgraph mining for graph information retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
Distance-join: pattern match query in a large graph database

Proceedings of the VLDB Endowment
gPrune: a constraint pushing framework for graph pattern mining

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Graph summaries for subgraph frequency estimation

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Summarization graph indexing: beyond frequent structure-based approach

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
GBLENDER: towards blending visual query formulation and query processing in graph databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards proximity pattern mining in large graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
An efficient features-based processing technique for supergraph queries

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
PrefIndex: an efficient supergraph containment search technique

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
DSI: a method for indexing large graphs using distance set

WAIM'10 Proceedings of the 11th international conference on Web-age information management
On graph query optimization in large networks

Proceedings of the VLDB Endowment
iGraph: a framework for comparisons of disk-based graph indexing techniques

Proceedings of the VLDB Endowment
SAPPER: subgraph indexing and approximate matching in large graphs

Proceedings of the VLDB Endowment
Liquid benchmarks: towards an online platform for collaborative assessment of computer science research results

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Fast business process similarity search with feature-based similarity estimation

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
Efficient and accurate retrieval of business process models through indexing

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems - Volume Part I
Efficient discovery of frequent subgraph patterns in uncertain graph databases

Proceedings of the 14th International Conference on Extending Database Technology
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

ACM Transactions on Information Systems (TOIS)
A tool for fast indexing and querying of graphs

Proceedings of the 20th international conference companion on World wide web
Improving constrained pattern mining with first-fail-based heuristics

Data Mining and Knowledge Discovery
Computing subgraph isomorphic queries using structural unification and minimum graph structures

Proceedings of the 2011 ACM Symposium on Applied Computing
Structure and attribute index for approximate graph matching in large graphs

Information Systems
Neighborhood based fast graph search in large networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
iGraph in action: performance analysis of disk-based graph indexing techniques

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Graph indexing and retrieval based on median graphs

MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition
Querying business process models based on semantics

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
An edge-based framework for fast subgraph matching in a large graph

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
On-line rule matching for event prediction

The VLDB Journal — The International Journal on Very Large Data Bases
Liquid benchmarks: benchmarking-as-a-service

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
gStore: answering SPARQL queries via subgraph matching

Proceedings of the VLDB Endowment
Information-geometric graph indexing from bags of partial node coverages

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Aggregated search in graph databases: preliminary results

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Using kernels on hierarchical graphs in automatic classification of designs

GbRPR'11 Proceedings of the 8th international conference on Graph-based representations in pattern recognition
Indexing tree structures through caterpillar decomposition

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Genetic selection of subgraphs for automatic reasoning in design systems

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
A flexible graph pattern matching framework via indexing

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Subgraph search over massive disk resident graphs

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
BR-index: an indexing structure for subgraph matching in very large dynamic graphs

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
K-nn queries in graph databases using M-trees

CAIP'11 Proceedings of the 14th international conference on Computer analysis of images and patterns - Volume Part I
A path-oriented RDF index for keyword search query processing

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Answering subgraph queries over large graphs

WAIM'11 Proceedings of the 12th international conference on Web-age information management
DELTA: indexing and querying multi-labeled graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
CP-index: on the efficient indexing of large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining frequent trees based on topology projection

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Using term lists and inverted files to improve search speed for metabolic pathway databases

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Answering pattern match queries in large graph databases via graph embedding

The VLDB Journal — The International Journal on Very Large Data Bases
NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
SIOUX: an efficient index for processing structural XQueries

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Querying ontologies in relational database systems

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
A resource efficient hybrid data structure for twig queries

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies
Indexing and mining of graph database based on interconnected subgraph

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Fast business process similarity search

Distributed and Parallel Databases
TreeSpan: efficiently computing similarity all-matching

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Reconstructing unsound data provenance view in scientific workflow

APWeb'12 Proceedings of the 14th international conference on Web Technologies and Applications
Finding top-k similar graphs in graph databases

Proceedings of the 15th International Conference on Extending Database Technology
Indexing and mining topological patterns for drug discovery

Proceedings of the 15th International Conference on Extending Database Technology
A relational-based approach for aggregated search in graph databases

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Efficient subgraph similarity all-matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Hypergraph-based image retrieval for graph-based representation

Pattern Recognition
Efficient subgraph similarity search on large probabilistic graph databases

Proceedings of the VLDB Endowment
Efficient indexing and querying over syntactically annotated trees

Proceedings of the VLDB Endowment
ECTree: an extended tree index for attributed subgraph queries

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Regular path queries on large graphs

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Substructure clustering: a novel mining paradigm for arbitrary data types

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Review of bisonet abstraction techniques

Bisociative Knowledge Discovery
Efficient algorithms for generalized subgraph query processing

Proceedings of the 21st ACM international conference on Information and knowledge management
G-SPARQL: a hybrid engine for querying large attributed graphs

Proceedings of the 21st ACM international conference on Information and knowledge management
On efficient processing of BPMN-Q queries

Computers in Industry
Efficient querying of large process model repositories

Computers in Industry
Mining dense structures to uncover anomalous behaviour in financial network data

MSM'11 Proceedings of the 2011 international conference on Modeling and Mining Ubiquitous Social Media
FNet: an index for advanced business process querying

BPM'12 Proceedings of the 10th international conference on Business Process Management
Top-k Similar Graph Matching Using TraM in Biological Networks

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A framework and a language for on-line analytical processing on graphs

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Graph database retrieval based on metric-trees

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Component retrieval based on a database of graphs for Hand-Written Electronic-Scheme Digitalisation

Expert Systems with Applications: An International Journal
An in-depth comparison of subgraph isomorphism algorithms in graph databases

Proceedings of the VLDB Endowment
Compressed feature-based filtering and verification approach for subgraph search

Proceedings of the 16th International Conference on Extending Database Technology
A similarity measure for approximate querying over RDF data

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Lindex: a lattice-based index for graph databases

The VLDB Journal — The International Journal on Very Large Data Bases
A direct mining approach to efficient constrained graph pattern discovery

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A multiobjective evolutionary programming framework for graph-based data mining

Information Sciences: an International Journal
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding the roles of sub-graph features for graph classification: an empirical study perspective

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Stochastically Balancing Trees for File and Database Systems

International Journal of Green Computing
Subtree selection in kernels for graph classification

International Journal of Data Mining and Bioinformatics
Mining and indexing graphs for supergraph search

Proceedings of the VLDB Endowment
k-nearest keyword search in RDF graphs

Web Semantics: Science, Services and Agents on the World Wide Web
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment
Facilitating representation and retrieval of structured cases: Principles and toolkit

Information Systems
Hybrid query execution engine for large attributed graphs

Information Systems
Modelling and exploring historical records to facilitate service composition

International Journal of Web and Grid Services
Efficient processing of graph similarity queries with edit distance constraints

The VLDB Journal — The International Journal on Very Large Data Bases
G-Tries: a data structure for storing and finding subgraphs

Data Mining and Knowledge Discovery
Querying business process model repositories

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.