A Randomized Exhaustive Propositionalization Approach for Molecule Classification

Authors:
Michele Samorani;Manuel Laguna;Robert Kirk DeLisle;Daniel C. Weaver
Affiliations:
Leeds School of Business, University of Colorado at Boulder, Boulder, Colorado 80309;Leeds School of Business, University of Colorado at Boulder, Boulder, Colorado 80309;Array BioPharma, Boulder, Colorado 80301;Array BioPharma, Boulder, Colorado 80301
Venue:
INFORMS Journal on Computing
Year:
2011

Citing 7
Cited 0

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Data mining: concepts and techniques

Data mining: concepts and techniques
The Role of Occam‘s Razor in Knowledge Discovery

Data Mining and Knowledge Discovery
Propositionalisation and Aggregates

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Improving inductive logic programming by using simulated annealing

Information Sciences: an International Journal
Top-down induction of decision trees classifiers - a survey

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Quantified Score

Hi-index	0.00

Visualization

Abstract

Drug discovery is the process of designing compounds that have desirable properties, such as activity and nontoxicity. Molecule classification techniques are used along with this process to predict the properties of the compounds to expedite their testing. Ideally, the classification rules found should be accurate and reveal novel chemical properties, but current molecule representation techniques lead to less-than-adequate accuracy and knowledge discovery. This work extends the propositionalization approach recently proposed for multirelational data mining in two ways: it generates expressive attributes exhaustively, and it uses randomization to sample a limited set of complex (“deep”) attributes. Our experimental tests show that the procedure is able to generate meaningful and interpretable attributes from molecular structural data, and that these features are effective for classification purposes.