A soft hierarchical algorithm for the clustering of multiple bioactive chemical compounds

  • Authors:
  • Jehan Zeb Shah;Naomie B. T. Salim

  • Affiliations:
  • Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia, Malaysia;Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia, Malaysia

  • Venue:
  • BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of the clustering methods used in the clustering of chemical structures such as Ward's, Group Average, K- means and Jarvis-Patrick, are known as hard or crisp as they partition a dataset into strictly disjoint subsets; and thus are not suitable for the clustering of chemical structures exhibiting more than one activity. Although, fuzzy clustering algorithms such as fuzzy c-means provides an inherent mechanism for the clustering of overlapping structures (objects) but this potential of the fuzzy methods which comes from its fuzzy membership functions have not been utilized effectively. In this work a fuzzy hierarchical algorithm is developed which provides a mechanism not only to benefit from the fuzzy clustering process but also to get advantage of the multiple membership function of the fuzzy clustering. The algorithm divides each and every cluster, if its size is larger than a pre-determined threshold, into two sub clusters based on the membership values of each structure. A structure is assigned to one or both the clusters if its membership value is very high or very similar respectively. The performance of the algorithm is evaluated on two bench mark datasets and a large dataset of compound structures derived from MDL's MDDR database. The results of the algorithm show significant improvement in comparison to a similar implementation of the hard c-means algorithm.