Independent informative subgraph mining for graph information retrieval

Authors:
Bingjun Sun;Prasenjit Mitra;C. Lee Giles
Affiliations:
Pennsylvania State University, University Park, PA, USA;Pennsylvania State University, University Park, PA, USA;Pennsylvania State University, University Park, PA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 24
Cited 4

Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Digital Libraries and Autonomous Citation Indexing

Computer
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Generalized Substructures from a Set of Labeled Graphs

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Improving Web search efficiency via a locality based static pruning method

WWW '05 Proceedings of the 14th international conference on World Wide Web
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
A document-centric approach to static index pruning in text retrieval systems

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Multi-task text segmentation and alignment based on weighted mutual information

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
XML structural delta mining: issues and challenges

Data & Knowledge Engineering - Special issue: ER 2003
Extraction and search of chemical formulae in text documents on the web

Proceedings of the 16th international conference on World Wide Web
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Topic segmentation with shared topic detection and alignment of multiple documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mining, indexing, and searching for textual chemical molecule information on the web

Proceedings of the 17th international conference on World Wide Web
A quantitative comparison of the subgraph miners mofa, gspan, FFSM, and gaston

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Using and learning semantics in frequent subgraph mining

WebKDD'05 Proceedings of the 7th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis

Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

ACM Transactions on Information Systems (TOIS)
Ranking structural parameters for social networks

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
Lindex: a lattice-based index for graph databases

The VLDB Journal — The International Journal on Very Large Data Bases
Mining and indexing graphs for supergraph search

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to enable scalable querying of graph databases, intelligent selection of subgraphs to index is essential. An improved index can reduce response times for graph queries significantly. For a given subgraph query, graph candidates that may contain the subgraph are retrieved using the graph index and subgraph isomorphism tests are performed to prune out unsatisfied graphs. However, since the space of all possible subgraphs of the whole set of graphs is prohibitively large, feature selection is required to identify a good subset of subgraph features for indexing. Thus, one of the key issues is: given the set of all possible subgraphs of the graph set, which subset of features is the optimal such that the algorithm retrieves the smallest set of candidate graphs and reduces the number of subgraph isomorphism tests? We introduce a graph search method for subgraph queries based on subgraph frequencies. Then, we propose several novel feature selection criteria, Max-Precision, Max-Irredundant-Information, and Max-Information-Min-Redundancy, based on mutual information. Finally we show theoretically and empirically that our proposed methods retrieve a smaller candidate set than previous methods. For example, using the same number of features, our method improve the precision for the query candidate set by 4%-13% in comparison to previous methods. As a result the response time of subgraph queries also is improved correspondingly.