Take a walk and cluster genes: a TSP-based approach to optimal rearrangement clustering

Authors:
Sharlee Climer;Weixiong Zhang
Affiliations:
Washington University in St. Louis, St. Louis, MO;Washington University in St. Louis, St. Louis, MO
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 4
Cited 6

Marketing applications of sequencing and partitioning of nonsymmetric and/or two-mode matrices

Data, expert knowledge and decisions
Techniques for Structuring Database Records

ACM Computing Surveys (CSUR)
TSP Cuts Which Do Not Conform to the Template Paradigm

Computational Combinatorial Optimization, Optimal or Provably Near-Optimal Solutions [based on a Spring School]
Deriving Program Physical Structures Using Bond Energy Algorithm

APSEC '99 Proceedings of the Sixth Asia Pacific Software Engineering Conference

MatrixExplorer: un système pour l'analyse exploratoire de réseaux sociaux

IHM '06 Proceedings of the 18th International Conferenceof the Association Francophone d'Interaction Homme-Machine
Evaluating visual table data understanding

Proceedings of the 2006 AVI workshop on BEyond time and errors: novel evaluation methods for information visualization
MatrixExplorer: a Dual-Representation System to Explore Social Networks

IEEE Transactions on Visualization and Computer Graphics
Rearrangement Clustering: Pitfalls, Remedies, and Applications

The Journal of Machine Learning Research
A hierarchical clustering algorithm based on the Hungarian method

Pattern Recognition Letters
Multiagent optimization system for solving the traveling salesman problem (TSP)

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster analysis is a fundamental problem and technique in many areas related to machine learning. In this paper, we consider rearrangement clustering, which is the problem of finding sets of objects that share common or similar features by arranging the rows (objects) of a matrix (specifying object features) in such a way that adjacent objects are similar to each other (based on a similarity measure of the features) so as to maximize the overall similarity. Based on formulating this problem as the Traveling Salesman Problem (TSP), we develop a new TSP-based optimal clustering algorithm called TSPCluster. We overcome a flaw that is inherent in previous approaches by relaxing restrictions on dissimilarities between clusters. Our new algorithm has three important features: finding the optimal k clusters for a given k, automatically detecting cluster borders, and ascertaining a set of most viable clustering results that make good balances among maximizing the overall similarity within clusters and dissimilarity between clusters. We apply TSPCluster to cluster and display ~500 genes of flowering plant Arabidopsis which are regulated under various abiotic stress conditions. We compare TSPCluster to the bond energy algorithm and two existing clustering algorithms. Our TSPCluster code is available at (Climer & Zhang, 2004).