Information Retrieval
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Integrated Computer-Aided Engineering
A multi-criteria optimization framework for industrial shop scheduling using fuzzy set theory
Integrated Computer-Aided Engineering
Integration of emerging computer technologies for an efficient image sequences analysis
Integrated Computer-Aided Engineering
2D and 3D palmprint information, PCA and HMM for an improved person recognition performance
Integrated Computer-Aided Engineering
Hi-index | 0.02 |
In this paper, we propose an unsupervised hybrid framework for protein sequence clustering and classification which incorporates protein structural motif information. The proposed framework consists of three stages: protein structural motif scan, hybrid clustering, and sequence classification. The incorporation of protein structural motif detected by ScanProsite service provides a better measurement in calculating the sequence similarity. The proposed two-phase hybrid clustering approach combines the strengths of the hierarchical and the partition clustering. Phase I adopts the hierarchical agglomerative clustering to pre-cluster multi-aligned sequences. Phase II performs the partition clustering which initiates its partition based on the result from Phase I and uses profile Hidden Markov Models (HMMs) to represent clusters. The profile HMMs are then stored in the database for unknown sequences classification, which is done by finding the best alignment of a sequence to each existing profile HMM. Our experiments demonstrate the effectiveness and the efficiency of the proposed framework for biological sequence clustering and classification.