Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

Authors:
S. Joshua Swamidass;Jonathan Chen;Jocelyne Bruand;Peter Phung;Liva Ralaivola;Pierre Baldi
Affiliations:
Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA;Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA;Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA;Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA;Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA;Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 15

2005 Speical Issue: Graph kernels for chemical informatics

Neural Networks - Special issue on neural networks and kernel methods for structured domains
Classifying Chemical Compounds Using Contrast and Common Patterns

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Active Learning for High Throughput Screening

DS '08 Proceedings of the 11th International Conference on Discovery Science
An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules

DS '08 Proceedings of the 11th International Conference on Discovery Science
Graph kernels based on tree patterns for molecules

Machine Learning
Aggregated Subset Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Recursive Neural Networks for Undirected Graphs for Learning Molecular Endpoints

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Cross-Platform Analysis with Binarized Gene Expression Data

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Fast, effective molecular feature mining by local optimization

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Bridging the Gap Between Neural Network and Kernel Methods: Applications to Drug Discovery

Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
Graph of words embedding for molecular structure-activity relationship analysis

CIARP'10 Proceedings of the 15th Iberoamerican congress conference on Progress in pattern recognition, image analysis, computer vision, and applications
Statistical distribution of chemical fingerprints

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
A tree-structured covalent-bond-driven molecular memetic algorithm for optimization of ring-deficient molecules

Computers & Mathematics with Applications
The gapped spectrum kernel for support vector machines

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	3.86

Visualization

Abstract

Motivation: Small molecules play a fundamental role in organic chemistry and biology. They can be used to probe biological systems and to discover new drugs and other useful compounds. As increasing numbers of large datasets of small molecules become available, it is necessary to develop computational methods that can deal with molecules of variable size and structure and predict their physical, chemical and biological properties. Results: Here we develop several new classes of kernels for small molecules using their 1D, 2D and 3D representations. In 1D, we consider string kernels based on SMILES strings. In 2D, we introduce several similarity kernels based on conventional or generalized fingerprints. Generalized fingerprints are derived by counting in different ways subpaths contained in the graph of bonds, using depth-first searches. In 3D, we consider similarity measures between histograms of pairwise distances between atom classes. These kernels can be computed efficiently and are applied to problems of classification and prediction of mutagenicity, toxicity and anti-cancer activity on three publicly available datasets. The results derived using cross-validation methods are state-of-the-art. Tradeoffs between various kernels are briefly discussed. Availability: Datasets available from http://www.igb.uci.edu/servers/servers.html Contact: pfbaldi@ics.uci.edu