Molecular feature mining in HIV data

Authors:
Stefan Kramer;Luc De Raedt;Christoph Helma
Affiliations:
Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany;Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany;Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University Freiburg, Georges,Köhler-Allee Geb. 79, D-79110 Freiburg/Br., Germany
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 15
Cited 73

The description identification problem

Artificial Intelligence
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Generalizing Version Spaces

Machine Learning
A database perspective on knowledge discovery

Communications of the ACM
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multiple Comparisons in Induction Algorithms

Machine Learning
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
An Extension to SQL for Mining Association Rules

Data Mining and Knowledge Discovery
Discovery of frequent DATALOG patterns

Data Mining and Knowledge Discovery
Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes

Data Mining and Knowledge Discovery
Constraint-Based, Multidimensional Data Mining

Computer
Feature Construction with Version Spaces for Biochemical Applications

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
An assessment of submissions made to the Predictive Toxicology Evaluation Challenge

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
The levelwise version space algorithm and its application to molecular fragment finding

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Data Mining as Constraint Logic Programming

Computational Logic: Logic Programming and Beyond, Essays in Honour of Robert A. Kowalski, Part II
Mining Patterns from Structured Data by Beam-Wise Graph-Based Induction

DS '02 Proceedings of the 5th International Conference on Discovery Science
Demand-Driven Construction of Structural Features in ILP

ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
Fast Algorithms for Mining Emerging Patterns

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
A perspective on inductive databases

ACM SIGKDD Explorations Newsletter
Mining Significant Pairs of Patterns from Graph Structures with Class Labels

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An Algebra for Inductive Query Evaluation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Scalability and efficiency in multi-relational data mining

ACM SIGKDD Explorations Newsletter
Biological applications of multi-relational data mining

ACM SIGKDD Explorations Newsletter
Frequent free tree discovery in graph data

Proceedings of the 2004 ACM symposium on Applied computing
Cyclic pattern kernels for predictive graph mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Extracting frequent connected subgraphs from large graph sets

Journal of Computer Science and Technology
Efficiently Mining Frequent Trees in a Forest: Algorithms and Applications

IEEE Transactions on Knowledge and Data Engineering
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
Weighted decomposition kernels

ICML '05 Proceedings of the 22nd international conference on Machine learning
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
CTC — Correlating Tree Patterns for Classification

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
XRules: An effective algorithm for structural classification of XML data

Machine Learning
A framework to support multiple query optimization for complex mining tasks

MDM '05 Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data
MoSS: a program for molecular substructure mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Dynamic Load Balancing for the Distributed Mining of Molecular Structures

IEEE Transactions on Parallel and Distributed Systems
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
A General Framework for Mining Frequent Subgraphs from Labeled Graphs

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Mining complex power networks for blackout prevention

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering frequent geometric subgraphs

Information Systems
Large scale mining of molecular fragments with wildcards

Intelligent Data Analysis
An inductive database and query language in the relational model

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Direct mining of discriminative and essential frequent patterns via model-based search tree

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Partial least squares regression for graph mining

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Kernels for Chemical Compounds in Biological Screening

ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II
Logic and the Automatic Acquisition of Scientific Knowledge: An Application to Functional Genomics

Computational Discovery of Scientific Knowledge
Classes of Kernels for Hit Definition in Compound Screening

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
SINDBAD and SiQL: An Inductive Database and Query Language in the Relational Model

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
An integrated, generic approach to pattern mining: data mining template library

Data Mining and Knowledge Discovery
A constraint-based querying system for exploratory pattern discovery

Information Systems
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Mining constraint-based patterns using automatic relaxation

Intelligent Data Analysis
Large-scale graph mining using backbone refinement classes

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Correlated itemset mining in ROC space: a constraint programming approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Visually Guiding and Controlling the Search While Mining Chemical Structures

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Discovering Emerging Graph Patterns from Chemicals

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Mining spatial object associations for scientific data

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
An efficient algorithm of frequent connected subgraph extraction

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
On interactive pattern mining from relational databases

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
High confidence fragment-based classification rule mining for imbalanced HIV data

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Frequent subgraph mining in outerplanar graphs

Data Mining and Knowledge Discovery
ILP, the blind, and the elephant: Euclidean embedding of co-proven queries

ILP'09 Proceedings of the 19th international conference on Inductive logic programming
A correlation-based approach to attribute selection in chemical graph mining

JSAI'03/JSAI04 Proceedings of the 2003 and 2004 international conference on New frontiers in artificial intelligence
An efficient distributed subgraph mining algorithm in extreme large graphs

AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I
Improving constrained pattern mining with first-fail-based heuristics

Data Mining and Knowledge Discovery
Inductive databases and constraint-based data mining

ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
Interactive discriminative mining of chemical fragments

ILP'10 Proceedings of the 20th international conference on Inductive logic programming
Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs

Journal of Intelligent Information Systems
An efficient algorithm for mining string databases under constraints

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Don't be afraid of simpler patterns

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Interestingness is not a dichotomy: introducing softness in constrained pattern mining

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Tree2: decision trees for tree structured data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
High performance subgraph mining in molecular compounds

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A comparison of approaches for learning probability trees

ECML'05 Proceedings of the 16th European conference on Machine Learning
Spiral mining using attributes from 3d molecular structures

AM'03 Proceedings of the Second international conference on Active Mining
Molecular fragment mining for drug discovery

ECSQARU'05 Proceedings of the 8th European conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
A relational query primitive for constraint-based pattern mining

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Inductive databases in the relational model: the data as the bridge

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
A General Framework for Mining Frequent Subgraphs from Labeled Graphs

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Out-of-bag discriminative graph mining

Proceedings of the 28th Annual ACM Symposium on Applied Computing
A polynomial-time maximum common subgraph algorithm for outerplanar graphs and its application to chemoinformatics

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.