Why neural networks should not be used for HIV-1 protease cleavage site prediction

Authors:
Thorsteinn Rögnvaldsson;Liwen You
Affiliations:
Intelligent Systems Laboratory, School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, 301 18 Sweden;Intelligent Systems Laboratory, School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, 301 18 Sweden
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 20

Machine learning for HIV-1 protease cleavage site prediction

Pattern Recognition Letters
Rapid and brief communication: Comparison among feature extraction methods for HIV-1 protease cleavage site prediction

Pattern Recognition
Ensemblator: An ensemble of classifiers for reliable classification of biological data

Pattern Recognition Letters
Letters: MppS: An ensemble of support vector machine based on multiple physicochemical properties of amino acids

Neurocomputing
Short communication: Specificity rule discovery in HIV-1 protease cleavage site analysis

Computational Biology and Chemistry
Short communication: Specificity rule discovery in HIV-1 protease cleavage site analysis

Computational Biology and Chemistry
Genetic nearest feature plane

Expert Systems with Applications: An International Journal
Particle swarm optimization for prototype reduction

Neurocomputing
Cluster-based nearest-neighbour classifier and its application on the lightning classification

Journal of Computer Science and Technology
Coding of amino acids by texture descriptors

Artificial Intelligence in Medicine
Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Letters: A reliable method for HIV-1 protease cleavage site prediction

Neurocomputing
Letters: Machine learning algorithms for T-cell epitopes prediction

Neurocomputing
Predicting HIV protease-cleavable peptides by discrete support vector machines

EvoBIO'07 Proceedings of the 5th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
A sparse Bayesian position weighted bio-kernel network

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Integrative viral molecular epidemiology: hepatitis C virus modeling

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
A new encoding technique for peptide classification

Expert Systems with Applications: An International Journal
Prototype reduction techniques: A comparison among different approaches

Expert Systems with Applications: An International Journal
Support vector machines for HIV-1 protease cleavage site prediction

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Predictability of rules in HIV-1 protease cleavage site analysis

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	3.84

Visualization

Abstract

Summary: Several papers have been published where nonlinear machine learning algorithms, e.g. artificial neural networks, support vector machines and decision trees, have been used to model the specificity of the HIV-1 protease and extract specificity rules. We show that the dataset used in these studies is linearly separable and that it is a misuse of nonlinear classifiers to apply them to this problem. The best solution on this dataset is achieved using a linear classifier like the simple perceptron or the linear support vector machine, and it is straightforward to extract rules from these linear models. We identify key residues in peptides that are efficiently cleaved by the HIV-1 protease and list the most prominent rules, relating them to experimental results for the HIV-1 protease. Motivation: Understanding HIV-1 protease specificity is important when designing HIV inhibitors and several different machine learning algorithms have been applied to the problem. However, little progress has been made in understanding the specificity because nonlinear and overly complex models have been used. Results: We show that the problem is much easier than what has previously been reported and that linear classifiers like the simple perceptron or linear support vector machines are at least as good predictors as nonlinear algorithms. We also show how sets of specificity rules can be generated from the resulting linear classifiers. Availability: The datasets used are available at http://www.hh.se/staff/bioinf/