A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning

Authors:
Meng Piao Tan;James R. Broach;Christodoulos A. Floudas
Affiliations:
Department of Chemical Engineering, Princeton University, Princeton, USA 08544;Department of Molecular Biology, Princeton University, Princeton, USA 08544;Department of Chemical Engineering, Princeton University, Princeton, USA 08544
Venue:
Journal of Global Optimization
Year:
2007

Citing 19
Cited 6

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Algorithms for clustering data

Algorithms for clustering data
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Linearization strategies for a class of zero-one mixed integer programming problems

Operations Research
APROS: algorithmic development methodology for discrete-continuous optimization problems

Operations Research
Self-organizing maps

Self-organizing maps
Cluster analysis and mathematical programming

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
Finding salient regions in images: nonparametric clustering for image segmentation and grouping

Computer Vision and Image Understanding - Special issue on content-based access for image and video libraries
Data clustering: a review

ACM Computing Surveys (CSUR)
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Clustering Algorithms

Clustering Algorithms
Cluster validity methods: part I

ACM SIGMOD Record
A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Journal of Global Optimization
An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Theoretic Clustering of Sparse Co-Occurrence Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Global Optimization RLT-based Approach for Solving the Hard Clustering Problem

Journal of Global Optimization
A Global Optimization RLT-based Approach for Solving the Fuzzy Clustering Problem

Journal of Global Optimization
Deterministic Global Optimization: Theory, Methods and (NONCONVEX OPTIMIZATION AND ITS APPLICATIONS Volume 37) (Nonconvex Optimization and Its Applications)

Deterministic Global Optimization: Theory, Methods and (NONCONVEX OPTIMIZATION AND ITS APPLICATIONS Volume 37) (Nonconvex Optimization and Its Applications)
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
A review of recent advances in global optimization

Journal of Global Optimization
Mathematical modeling and efficient optimization methods for the distance-dependent rearrangement clustering problem

Journal of Global Optimization
Performance evaluation of density-based clustering methods

Information Sciences: an International Journal
A network flow model for biclustering via optimal re-ordering of data matrices

Journal of Global Optimization
Solving the Order-Preserving Submatrix Problem via Integer Programming

INFORMS Journal on Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster analysis of genome-wide expression data from DNA microarray hybridization studies is a useful tool for identifying biologically relevant gene groupings (DeRisi et al. 1997; Weiler et al. 1997). It is hence important to apply a rigorous yet intuitive clustering algorithm to uncover these genomic relationships. In this study, we describe a novel clustering algorithm framework based on a variant of the Generalized Benders Decomposition, denoted as the Global Optimum Search (Floudas et al. 1989; Floudas 1995), which includes a procedure to determine the optimal number of clusters to be used. The approach involves a pre-clustering of data points to define an initial number of clusters and the iterative solution of a Linear Programming problem (the primal problem) and a Mixed-Integer Linear Programming problem (the master problem), that are derived from a Mixed Integer Nonlinear Programming problem formulation. Badly placed data points are removed to form new clusters, thus ensuring tight groupings amongst the data points and incrementing the number of clusters until the optimum number is reached. We apply the proposed clustering algorithm to experimental DNA microarray data centered on the Ras signaling pathway in the yeast Saccharomyces cerevisiae and compare the results to that obtained with some commonly used clustering algorithms. Our algorithm compares favorably against these algorithms in the aspects of intra-cluster similarity and inter-cluster dissimilarity, often considered two key tenets of clustering. Furthermore, our algorithm can predict the optimal number of clusters, and the biological coherence of the predicted clusters is analyzed through gene ontology.