Frequent Substructure-Based Approaches for Classifying Chemical Compounds

Authors:
Mukund Deshpande;Michihiro Kuramochi;Nikil Wale;George Karypis
Affiliations:
-;-;-;IEEE Computer Society
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 27
Cited 54

Topologial torsion: A new molecular descriptor for sar applications. Comparison with other descriptors

Journal of Chemical Information & Computer Sciences
Determining structural similarity of chemicals using graph-theoretic indices

Discrete Applied Mathematics - Applications of Graphs in Chemistry and Physics
C4.5: programs for machine learning

C4.5: programs for machine learning
Rapid approximation to molecular surface area via the use of Boolean logic and look-up tables

Journal of Computational Chemistry
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Robust Classification for Imprecise Environments

Machine Learning
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Virtual Screening for Bioactive Molecules

Virtual Screening for Bioactive Molecules
Neural Networks for Chemists; An Introduction

Neural Networks for Chemists; An Introduction
Using conjunction of attribute values for classification

Proceedings of the eleventh international conference on Information and knowledge management
Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes

Data Mining and Knowledge Discovery
Neural Networks in QSAR and Drug Design

Neural Networks in QSAR and Drug Design
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Graph-Based Data Mining

IEEE Intelligent Systems
Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Comparisons of Classification Methods for Screening Potential Compounds

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Data Organization and Access for Efficient Data Mining

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
The predictive toxicology evaluation challenge

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1

Frequent subgraph mining in outerplanar graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
Discovering frequent geometric subgraphs

Information Systems
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Partial least squares regression for graph mining

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Feature Sets for Structured Data

ECML '07 Proceedings of the 18th European conference on Machine Learning
User Assisted Substructure Extraction in Molecular Data Mining

MDA '08 Proceedings of the 3rd international conference on Advances in Mass Data Analysis of Images and Signals in Medicine, Biotechnology, Chemistry and Food Industry
Efficient Frequent Connected Subgraph Mining in Graphs of Bounded Treewidth

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
On effective presentation of graph patterns: a structural representative approach

Proceedings of the 17th ACM conference on Information and knowledge management
Structure feature selection for graph classification

Proceedings of the 17th ACM conference on Information and knowledge management
An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules

DS '08 Proceedings of the 11th International Conference on Discovery Science
Identifying Users Stereotypes with Semantic Web Mining

ER '08 Proceedings of the ER 2008 Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advances in Conceptual Modeling: Challenges and Opportunities
G-hash: towards fast kernel-based similarity search in large graph databases

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Graph kernels based on tree patterns for molecules

Machine Learning
Classification of software behaviors for failure detection: a discriminative pattern mining approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Correlated itemset mining in ROC space: a constraint programming approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Pre-processing Techniques for the QSAR Problem

Proceedings of the 2008 conference on New Trends in Multimedia and Network Information Systems
Recursive Neural Networks for Undirected Graphs for Learning Molecular Endpoints

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Selecting Computer Architectures by Means of Control-Flow-Graph Mining

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Graph classification based on pattern co-occurrence

Proceedings of the 18th ACM conference on Information and knowledge management
L2 norm regularized feature kernel regression for graph data

Proceedings of the 18th ACM conference on Information and knowledge management
gRegress: extracting features from graph transactions for regression

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
gPrune: a constraint pushing framework for graph pattern mining

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Graph summaries for subgraph frequency estimation

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Towards proximity pattern mining in large graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
GAIA: graph classification using evolutionary computation

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient frequent connected subgraph mining in graphs of bounded tree-width

Theoretical Computer Science
Frequent subgraph mining in outerplanar graphs

Data Mining and Knowledge Discovery
Sentiment classification using automatically extracted subgraph features

CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Constructing classification features using minimal predictive patterns

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Classifying graphs using theoretical metrics: a study of feasibility

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Network ensemble clustering using latent roles

Advances in Data Analysis and Classification
Using graphs to improve activity prediction in smart environments based on motion sensor data

ICOST'11 Proceedings of the 9th international conference on Toward useful services for elderly and people with disabilities: smart homes and health telematics
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
Balanced multi-process parallel algorithm for chemical compound inference with given path frequencies

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Discovering informative social subgraphs and predicting pairwise relationships from group photos

Proceedings of the 20th ACM international conference on Multimedia
Graph classification: a diversified discriminative feature selection approach

Proceedings of the 21st ACM international conference on Information and knowledge management
A bayesian scoring technique for mining predictive and non-spurious rules

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research
A direct mining approach to efficient constrained graph pattern discovery

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The complexity of mining maximal frequent subgraphs

Proceedings of the 32nd symposium on Principles of database systems
Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A multiobjective evolutionary programming framework for graph-based data mining

Information Sciences: an International Journal
A temporal pattern mining approach for classifying electronic health record data

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Frequent subgraph summarization with error control

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Subtree selection in kernels for graph classification

International Journal of Data Mining and Bioinformatics
A new proposal for graph-based image classification using frequent approximate subgraphs

Pattern Recognition
A new proposal for graph classification using frequent geometric subgraphs

Data & Knowledge Engineering
Graph classification with imbalanced class distributions and noise

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Annals of Mathematics and Artificial Intelligence
Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship QSAR modeling

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computational techniques that build models to correctly assign chemical compounds to various classes of interest have many applications in pharmaceutical research and are used extensively at various phases during the drug development process. These techniques are used to solve a number of classification problems such as predicting whether or not a chemical compound has the desired biological activity, is toxic or nontoxic, and filtering out drug-like compounds from large compound libraries. This paper presents a substructure-based classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the data set. The advantage of this approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Experimental evaluation on eight different classification problems shows that our approach is computationally scalable and, on average, outperforms existing schemes by 7 percent to 35 percent.