Induction of comprehensible models for gene expression datasets by subgroup discovery methodology

Authors:
Dragan Gamberger;Nada Lavrač;Filip Železný;Jakub Tolar
Affiliations:
Laboratory for Information Systems, Rudjer Bošković Institute, Bijenička 54, 10000 Zagreb, Croatia;Department of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia and Nova Gorica Polytechnic Vipavska 13, 5000 Nova Gorica, Slovenia;Department of Cybernetics, Czech Institute of Technology (CVUT FEL), Technická 2, 16627 Prague, Czech Republic and Department of Biostatistics, University of Wisconsin Medical School, 1300 Un ...;Institute of Human Genetics, University of Minnesota Medical School, 420 Delaware Street, 55455 Minneapolis
Venue:
Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Year:
2004

Citing 14
Cited 14

Overfitting Avoidance as Bias

Machine Learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Machine Learning

Machine Learning
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
The Role of Occam‘s Razor in Knowledge Discovery

Data Mining and Knowledge Discovery
A Relevancy Filter for Constructive Induction

IEEE Intelligent Systems
The CN2 Induction Algorithm

Machine Learning
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Geography of Differences between Two Classes of Data

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Relation Between Permutation-Test P Values and Classifier Error Estimates

Machine Learning
Using machine learning to design and interpret gene-expression microarrays

AI Magazine
Expert-guided subgroup discovery: methodology and application

Journal of Artificial Intelligence Research

Guest editorial: research on machine learning issues in biomedical informatics modeling

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
Applications of machine learning: matching problems to tasks and methods

The Knowledge Engineering Review
Summarizing gene-expression-based classifiers by meta-mining comprehensible relational patterns

BioMed'06 Proceedings of the 24th IASTED international conference on Biomedical engineering
Clinical data analysis based on iterative subgroup discovery: experiments in brain ischaemia data analysis

Applied Intelligence
Methodological Review: Towards knowledge-based gene expression data mining

Journal of Biomedical Informatics
A parallel genetic algorithm to discover patterns in genetic markers that indicate predisposition to multifactorial disease

Computers in Biology and Medicine
Closed Sets for Labeled Data

The Journal of Machine Learning Research
Handling Unknown and Imprecise Attribute Values in Propositional Rule Learning: A Feature-Based Approach

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Improved Comprehensibility and Reliability of Explanations via Restricted Halfspace Discretization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Subgroup discovery techniques and applications

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Relevancy in constraint-based subgroup discovery

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Conquering the curse of dimensionality in gene expression cancer diagnosis: tough problem, simple models

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
How many trees in a random forest?

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Semantic subgroup discovery and cross-context linking for microarray data analysis

Bisociative Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations.