Summarizing gene-expression-based classifiers by meta-mining comprehensible relational patterns

Authors:
Filip Železný;Olga Štěpánková;Jakub Tolar;Nada Lavrač
Affiliations:
Deptartment of Cybernetics, Czech Technical Univ. in Prague, Praha, Czech Republic;Deptartment of Cybernetics, Czech Technical Univ. in Prague, Praha, Czech Republic;Department of Pediatrics, Univ. of Minnesota Medical School, Minneapolis;Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Venue:
BioMed'06 Proceedings of the 24th IASTED international conference on Biomedical engineering
Year:
2006

Citing 8
Cited 0

An extended transformation approach to inductive logic programming

ACM Transactions on Computational Logic (TOCL) - Special issue devoted to Robert A. Kowalski
The CN2 Induction Algorithm

Machine Learning
Subgroup Discovery with CN2-SD

The Journal of Machine Learning Research
Induction of comprehensible models for gene expression datasets by subgroup discovery methodology

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Small, fuzzy and interpretable gene expression based classifiers

Bioinformatics
Propositionalization-based relational subgroup discovery with RSD

Machine Learning
RSD: relational subgroup discovery through first-order feature construction

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Conquering the curse of dimensionality in gene expression cancer diagnosis: tough problem, simple models

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a methodology for predictive classification from gene expression data, able to combine the robustness of high-dimensional statistical classification methods with the comprehensibility and interpretability of simple logic-based models. We first construct a robust classifier combining contributions of a large number of gene expression values, and then (meta)-mine the classifier for compact summarizations of subgroups among genes associated with a given class therein. The subgroups are described by means of relational logic features extracted from publicly available gene ontology information. The curse of dimensionality pertaining to the gene expression based classification problem due to the large number of attributes (genes) is turned into an advantage in the secondary, meta-mining task as here the original attributes become learning examples. We cross-validate the proposed method on two classification problems: (i) acute lymphoblastic leukemia (ALL) vs. acute myeloid leukemia (AML), (ii) seven subclasses of ALL.