Graph kernels for chemical compounds using topological and three-dimensional local atom pair environments

  • Authors:
  • Georg Hinselmann;Nikolas Fechner;Andreas Jahn;Matthias Eckert;Andreas Zell

  • Affiliations:
  • Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany;Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Approaches that can predict the biological activity or properties of a chemical compound are an important application of machine learning. In this paper, we introduce a new kernel function for measuring the similarity between chemical compounds and for learning their related properties and activities. The method is based on local atom pair environments which can be rapidly computed by using the topological all-shortest paths matrix and the geometrical distance matrix of a molecular graph as lookup tables. The local atom pair environments are stored in prefix search trees, so called tries, for an efficient comparison. The kernel can be either computed as an optimal assignment kernel or as a corresponding convolution kernel over all local atom similarities. We implemented the Tanimoto kernel, min kernel, minmax kernel and the dot product kernel as local kernels, which are computed recursively by traversing the tries. We tested the approach on eight structure-activity and structure-property molecule benchmark data sets from the literature. The models were trained with @e- support vector regression and support vector classification. The local atom pair kernels showed to be at least competitive to state-of-the-art kernels in seven out of eight cases in a direct comparison. A comparison against literature results using similar experimental setups as in the original works confirmed these findings. The method is easy to implement and has robust default parameters.