FIK Model: Novel Efficient Granular Computing Model for Protein Sequence Motifs and Structure Information Discovery

Authors:
Bernard Chen;Phang C. Tai;Robert Harrison;Yi Pan
Affiliations:
Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA;Georgia State University, Atlanta, GA
Venue:
BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
Year:
2006

Citing 0
Cited 2

Statistical estimate for the size of the protein structural vocabulary

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Sparse nonnegative matrix factorization for protein sequence motif discovery

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein sequence motifs information is very important to the analysis of biologically significant regions. The conserved regions have the potential to determine the conformation, function and activities of the proteins. The main purpose of this paper is trying to obtain protein sequence motifs which are universally conserved and across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is demanded. In this article, short recurring segments of proteins are explored by utilizing a novel granular computing strategy. First, Fuzzy C-Means clustering algorithm (FCM) is used to separate the whole dataset into several smaller informational granules and then succeeded by improved K-means clustering algorithm on each granule to obtain the final results. The structural similarity of the clusters discovered by our approach is studied to analyze how the recurring patterns correlate with its structure. Also, some biochemical references are included in our evaluation. To the best of our knowledge, it is the first time that the granular computing concept as well as the DBI measure for evaluation is introduced to this dataset. Compare with the latest research results, our method requires only twenty percent of the execution time and obtains even higher quality information of protein sequence motifs. The efficient and satisfactory results in our experiment suggests that our granular computing model which combined FCM and improved K-means may have a high chance to be applied in some other bioinformatics research fields and yield stunning results.