Clustering 100,000 Protein Structure Decoys in Minutes

Authors:
Shuai Cheng Li;Dongbo Bu;Ming Li
Affiliations:
City University of Hong Kong, Hong Kong;Chinese Academy of Sciences, Beijing;University of Waterloo, Waterloo
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 4
Cited 1

Least-Squares Fitting of Two 3-D Point Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Least-squares fitting of multiple M-dimensional point sets

The Visual Computer: International Journal of Computer Graphics
Finding compact structural motifs

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ab initio protein structure prediction methods first generate large sets of structural conformations as candidates (called decoys), and then select the most representative decoys through clustering techniques. Classical clustering methods are inefficient due to the pairwise distance calculation, and thus become infeasible when the number of decoys is large. In addition, the existing clustering approaches suffer from the arbitrariness in determining a distance threshold for proteins within a cluster: a small distance threshold leads to many small clusters, while a large distance threshold results in the merging of several independent clusters into one cluster. In this paper, we propose an efficient clustering method through fast estimating cluster centroids and efficient pruning rotation spaces. The number of clusters is automatically detected by information distance criteria. A package named ONION, which can be downloaded freely, is implemented accordingly. Experimental results on benchmark data sets suggest that ONION is 14 times faster than existing tools, and ONION obtains better selections for 31 targets, and worse selection for 19 targets compared to SPICKER's selections. On an average PC, ONION can cluster 100,000 decoys in around 12 minutes.