Graph kernels for chemical compounds using topological and three-dimensional local atom pair environments

Authors:
Georg Hinselmann;Nikolas Fechner;Andreas Jahn;Matthias Eckert;Andreas Zell
Affiliations:
Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany
Venue:
Neurocomputing
Year:
2010

Citing 11
Cited 1

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Efficient Algorithms for Shortest Paths in Sparse Networks

Journal of the ACM (JACM)
Algorithm 97: Shortest path

Communications of the ACM
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Exact and Approximate Graph Matching Using Random Walks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Optimal assignment kernels for attributed molecular graphs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Shortest-Path Kernels on Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
2005 Speical Issue: Graph kernels for chemical informatics

Neural Networks - Special issue on neural networks and kernel methods for structured domains
SVM learning with the Schur-Hadamard inner product for graphs

Neurocomputing
Pairwise global alignment of protein interaction networks by matching neighborhood topology

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Bipartite graph matching for computing the edit distance of graphs

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition

Comparative analysis of the use of chemoinformatics-based and substructure-based descriptors for quantitative structure-activity relationship QSAR modeling

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

Approaches that can predict the biological activity or properties of a chemical compound are an important application of machine learning. In this paper, we introduce a new kernel function for measuring the similarity between chemical compounds and for learning their related properties and activities. The method is based on local atom pair environments which can be rapidly computed by using the topological all-shortest paths matrix and the geometrical distance matrix of a molecular graph as lookup tables. The local atom pair environments are stored in prefix search trees, so called tries, for an efficient comparison. The kernel can be either computed as an optimal assignment kernel or as a corresponding convolution kernel over all local atom similarities. We implemented the Tanimoto kernel, min kernel, minmax kernel and the dot product kernel as local kernels, which are computed recursively by traversing the tries. We tested the approach on eight structure-activity and structure-property molecule benchmark data sets from the literature. The models were trained with @e- support vector regression and support vector classification. The local atom pair kernels showed to be at least competitive to state-of-the-art kernels in seven out of eight cases in a direct comparison. A comparison against literature results using similar experimental setups as in the original works confirmed these findings. The method is easy to implement and has robust default parameters.