Protein homology detection using string alignment kernels
Bioinformatics
Protein homology detection by HMM--HMM comparison
Bioinformatics
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Motif-based protein ranking by network propagation
Bioinformatics
Remote homology detection based on oligomer distances
Bioinformatics
Computational Biology and Chemistry
Bioinformatics
DELPHI: a pattern-based method for detecting sequence similarity
IBM Journal of Research and Development
Protein remote homology detection based on binary profiles
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Hi-index | 0.00 |
Protein remote homology detection is a critical step toward annotating its structure and function. Supervised learning algorithms such as support vector machine are currently the most accurate methods. The position-specific score matrices (PSSMs) contain wealthy information about the evolutionary relationship of proteins. However, the PSSMs often have different lengths, which are difficult to be used by machine-learning methods. In this study, a simple, fast and powerful method is presented for protein remote homology detection, which combines support vector machine with auto-cross covariance transformation. The PSSMs are converted into a series of fixed-length vectors by auto-cross covariance transformation and these vectors are then input to a support vector machine classifier for remote homology detection. The sequence-order effects can be effectively captured by this scheme. Experiments are performed on well-established datasets, and the remote homology is simulated at the superfamily and the fold level, respectively. The results show that the proposed method, referred to as ACCRe, is comparable or even better than the state-of-the-art methods in terms of detection performance, and its time complexity is superior to those of other profile-based SVM methods. The auto-cross covariance transformation provides a novel way for the usage of evolutionary information, which can be widely used for protein-level studies.