Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Data mining in bioinformatics using Weka
Bioinformatics
Top 10 algorithms in data mining
Knowledge and Information Systems
Hi-index | 0.00 |
Major histocompatibility complex class I (MHC I) molecules belong to a large and diverse protein superfamily whose families can be divided in three groups according to the type of ligands that they can accommodate (ligand-type specificity): peptides, lipids or none. Here, we assembled a dataset of MHC I proteins of known ligand-type specificity (MHCI556 dataset) and trained k-nearest neighbor and support vector machine algorithms. In cross-validation, the resulting classifiers predicted the ligand-type specificity of MHC I molecules with an accuracy ≥ 99%, using solely their amino acid composition. By holding out entire MHC I families prior to model building, we proved that ML-based classifiers trained on amino acid composition are capable of predicting the ligand-type specificity of MHC I molecules unrelated to those used for model building. Moreover, they are superior to BLAST at predicting the class of MHC I molecules that do not bind any ligand.