Mining chemical compound structure data using inductive logic programming

Authors:
Cholwich Nattee;Sukree Sinthupinyo;Masayuki Numao;Takashi Okada
Affiliations:
The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan;The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan;The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan;Department of Informatics, School of Science and Technology, Kwansei Gakuin University, Hyogo, Japan
Venue:
AM'03 Proceedings of the Second international conference on Active Mining
Year:
2003

Citing 9
Cited 1

Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
A framework for multiple-instance learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Learning Logical Definitions from Relations

Machine Learning
A Framework for Learning Rules from Multiple Instance Data

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Multi-Instance Kernels

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Solving the Multiple-Instance Problem: A Lazy Learning Approach

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A new method for solving hard satisfiability problems

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

Active mining project: overview

AM'03 Proceedings of the Second international conference on Active Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering knowledge from chemical compound structure data is a challenge task in KDD. It aims to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Since each compound composes of several parts with complicated relations among them, traditional mining algorithms cannot handle this kind of data efficiently. In this research, we apply Inductive Logic Programming (ILP) for classifying chemical compounds. ILP provides comprehensibility to learning results and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. We introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. The approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. Chemical compound data is multiple-part data. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for chemical compound structure by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.