On the Importance of Comprehensible Classification Models for Protein Function Prediction

Authors:
Alex A. Freitas;Daniela C. Wieser;Rolf Apweiler
Affiliations:
University of Kent, Canterbury;European Bioinformatics Institute, Cambridge;European Bioinformatics Institute, Cambridge
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 24
Cited 9

C4.5: programs for machine learning

C4.5: programs for machine learning
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Classification

Machine learning, neural and statistical classification
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms

Machine Learning
Data Mining Using Grammar-Based Genetic Programming and Applications

Data Mining Using Grammar-Based Genetic Programming and Applications
Editorial

Artificial Intelligence Review - Special issue on lazy learning
Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results

Machine Learning
Knowledge discovery from data?

IEEE Intelligent Systems
Using a mixture of probabilistic decision trees for direct prediction of protein function

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Meta-Learning by Landmarking Various Learning Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Improved Dataset Characterisation for Meta-learning

DS '02 Proceedings of the 5th International Conference on Discovery Science
Bayesian Artificial Intelligence

Bayesian Artificial Intelligence
Rule extraction from linear support vector machines

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Confirmation of data mining based predictions of protein function

Bioinformatics
Filtering erroneous protein annotation

Bioinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
A survey of interestingness measures for knowledge discovery

The Knowledge Engineering Review
Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review

Neural Computation
A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Predicting post-synaptic activity in proteins with data mining

Bioinformatics
Functional bioinformatics for Arabidopsis thaliana

Bioinformatics
iPTREE-STAB

Bioinformatics
Transmembrane segments prediction and understanding using support vector machine and decision tree

Expert Systems with Applications: An International Journal
The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks

IEEE Transactions on Neural Networks

LEGAL-tree: a lexicographic multi-objective genetic algorithm for decision tree induction

Proceedings of the 2009 ACM symposium on Applied Computing
Evolutionary model tree induction

Proceedings of the 2010 ACM Symposium on Applied Computing
Lexicographic multi-objective evolutionary induction of decision trees

International Journal of Bio-Inspired Computation
Evolutionary model trees for handling continuous classes in machine learning

Information Sciences: an International Journal
Applying wearable solutions in dependent environments

IEEE Transactions on Information Technology in Biomedicine
Learning data structure from classes: A case study applied to population genetics

Information Sciences: an International Journal
A Framework for Incorporating Functional Interrelationships into Protein Function Prediction Algorithms

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Inducing decision trees with an ant colony optimization algorithm

Applied Soft Computing
Comprehensible classification models: a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

The literature on protein function prediction is currently dominated by works aimed at maximizing predictive accuracy, ignoring the important issues of validation and interpretation of discovered knowledge, which can lead to new insights and hypotheses that are biologically meaningful and advance the understanding of protein functions by biologists. The overall goal of this paper is to critically evaluate this approach, offering a refreshing new perspective on this issue, focusing not only on predictive accuracy but also on the comprehensibility of the induced protein function prediction models. More specifically, this paper aims to offer two main contributions to the area of protein function prediction. First, it presents the case for discovering comprehensible protein function prediction models from data, discussing in detail the advantages of such models, namely, increasing the confidence of the biologist in the system's predictions, leading to new insights about the data and the formulation of new biological hypotheses, and detecting errors in the data. Second, it presents a critical review of the pros and cons of several different knowledge representations that can be used in order to support the discovery of comprehensible protein function prediction models.