Additional limitations of the clustering validation method figure of merit

  • Authors:
  • Amy L. Olex;David J. John;Elizabeth M. Hiltbold;Jacquelyn S. Fetrow

  • Affiliations:
  • Wake Forest University, Winston-Salem, NC;Wake Forest University, Winston-Salem, NC;Wake Forest University, Winston-Salem, NC;Wake Forest University, Winston-Salem, NC

  • Venue:
  • ACM-SE 45 Proceedings of the 45th annual southeast regional conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering analysis is an important exploratory tool that aids in the analysis and organization of genomic data. Each biological data set has different characteris, and the decision of which clustering method is appropriate and how many clusters are optimal on a dataset-by-dataset basis can be problematic. The Figure of Merit (FOM) is a quantitative clustering validation method designed to aid in these decisions. While FOM is useful, it does have limitations which must be considered when using it. This research shows that the FOM is biased toward Euclidean distance. Performing FOM analysis on clusters created by using Pearson's correlation coefficient as a similarity measure is shown to be non-optimal, and mathematically inadvisable. A new, correlation coefficient-biased version of the FOM has been developed, and preliminary results indicate that this new FOM is effectively biased toward clusters generated using the correlation coefficient.