ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Clustering for bioinformatics via matrix optimization
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
One of the fundamental clustering problems is to assign $n$ points into $k$ clusters based on minimal sum-of-squared distances (MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP) problem. We show that our 0-1 SDP model provides a unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the linear programming and SDP relaxations. Second, we consider the issue of how to extract a feasible solution of the original 0-1 SDP model from the optimal solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. In our rounding procedure, we need to solve a K-means clustering problem in $\Re^{k-1}$, which can be done in $O(n^{k^2-2k+2})$ time. In case of biclustering, the running time of our rounding procedure can be reduced to $O(n\log n)$. We show that our algorithm provides a 2-approximate solution to the original problem. Promising numerical results for biclustering based on our new method are reported.