Mining protein family specific residue packing patterns from protein structure graphs

Authors:
Jun Huan;Wei Wang;Deepak Bandyopadhyay;Jack Snoeyink;Jan Prins;Alexander Tropsha
Affiliations:
University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill;University of North Carolina at Chapel Hill
Venue:
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Year:
2004

Citing 14
Cited 17

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
The nature of statistical learning theory

The nature of statistical learning theory
The quickhull algorithm for convex hulls

ACM Transactions on Mathematical Software (TOMS)
Distribution of distances and triangles in a point set and algorithms for computing the largest common point sets

SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
RAPID: randomized pharmacophore identification for drug design

SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Geometric matching under noise: combinatorial bounds and algorithms

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Introduction to Algorithms

Introduction to Algorithms
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Local Similarity in RNA Secondary Structures

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Almost-Delaunay simplices: nearest neighbor relations for imprecise points

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms

SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining closed relational graphs with connectivity constraints

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Almost-Delaunay simplices: Robust neighbor relations for imprecise 3D points using CGAL

Computational Geometry: Theory and Applications
Effective and efficient itemset pattern summarization: regression-based approaches

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Structure feature selection for graph classification

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
Graph classification based on pattern co-occurrence

Proceedings of the 18th ACM conference on Information and knowledge management
gRegress: extracting features from graph transactions for regression

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Protein Structure Classification Based on Conserved Hydrophobic Residues

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Towards proximity pattern mining in large graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining structured data

IEEE Computational Intelligence Magazine
An efficient features-based processing technique for supergraph queries

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Fast graph query processing with a low-cost index

The VLDB Journal — The International Journal on Very Large Data Bases
EFP-M2: efficient model for mining frequent patterns in transactional database

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II
Hybrid query execution engine for large attributed graphs

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. We apply a novel frequent subgraph mining algorithm to three graph representations of protein three-dimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are connected by edges using three approaches: first, based on simple distance threshold between contact residues; second using the Delaunay tessellation from computational geometry, and third using the recently developed almost-Delaunay tessellation approach.Applying a frequent subgraph mining algorithm to a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, we typically identify several hundred common subgraphs equivalent to common packing motifs found in the majority of proteins in the family. We also use the counts of motifs extracted from proteins in two different SCOP families as input variables in a binary classification experiment. The resulting models are capable of predicting the protein family association with the accuracy exceeding 90 percent. Our results indicate that graphs based on both almost-Delaunay and Delaunay tessellations are sparser than the contact distance graphs; yet they are robust and efficient for mining protein spatial motif.