A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Bioinformatics: the machine learning approach
Bioinformatics: the machine learning approach
Information Retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Protein homology detection by HMM--HMM comparison
Bioinformatics
Bioinformatics
Supervised machine learning algorithms for protein structure classification
Computational Biology and Chemistry
Hi-index | 0.00 |
One of the major research directions in bioinformatics is that of assigning superfamily classification to a given set of proteins. The classification reflects the structural, evolutionary, and functional relatedness. These relationships are embodied in a hierarchical classification, such as the Structural Classification of Protein (SCOP), which is mostly manually curated. Such a classification is essential for the structural and functional analyses of proteins. Yet a large number of proteins remain unclassified. In this study, we have proposed an unsupervised machine learning approach to classify and assign a given set of proteins to SCOP superfamilies. In the method, we have constructed a database and similarity matrix using P-values obtained from an all-against-all BLAST run and trained the network with the ART2 unsupervised learning algorithm using the rows of the similarity matrix as input vectors, enabling the trained network to classify the proteins from 0.82 to 0.97 f-measure accuracy. The performance of ART2 has been compared with that of spectral clustering, Random forest, SVM, and HHpred. ART2 performs better than the others except HHpred. HHpred performs better than ART2 and the sum of errors is smaller than that of the other methods evaluated.