Molecular feature mining in HIV data

  • Authors:
  • Stefan Kramer;Luc De Raedt;Christoph Helma

  • Affiliations:
  • Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany;Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany;Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany

  • Venue:
  • Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.