Indexing and mining topological patterns for drug discovery

Authors:
Sayan Ranu;Ambuj K. Singh
Affiliations:
University of California, Santa Barbara, CA;University of California, Santa Barbara, CA
Venue:
Proceedings of the 15th International Conference on Extending Database Technology
Year:
2012

Citing 22
Cited 0

Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Frequent Labeled and Partially Labeled Graph Patterns

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Searching Substructures with Superimposed Distance

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Proceedings of the VLDB Endowment
GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Comparing stars: on approximating graph edit distance

Proceedings of the VLDB Endowment
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
GAIA: graph classification using evolutionary computation

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Connected substructure similarity search

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increased availability of large repositories of chemical compounds has created new challenges and opportunities for the application of data-mining and indexing techniques to problems in chemical informatics. The primary goal in analysis of molecular databases is to identify structural patterns that can predict biological activity. Two of the most popular approaches to representing molecular topologies are graphs and 3D geometries. As a result, the problem of indexing and mining structural patterns map to indexing and mining patterns from graph and 3D geometric databases. In this tutorial, we will first introduce the problem of drug discovery and how computer science plays a critical role in that process. We will then proceed by introducing the problem of performing subgraph and similarity searches on large graph databases. Due to the NP-hardness of the problems, a number of heuristics have been designed in recent years and the tutorial will present an overview of those techniques. Next, we will introduce the problem of mining frequent subgraph patterns along with some of their limitations that ignited the interest in the problem of mining statistically significant subgraph patterns. After presenting an in-depth survey of the techniques on mining significant subgraph patterns, the tutorial will proceed towards the problem of analyzing 3D geometric structures of molecules. Finally, we will conclude by presenting two open computer science problems that can have a significant impact in the field of drug discovery.