A Fast Quartet tree heuristic for hierarchical clustering

Authors:
Rudi L. Cilibrasi;Paul M. B. Vitányi
Affiliations:
National Research Institute for Mathematics and Computer Science (CWI), CWI, Science Park 123, 1098 XG Amsterdam, The Netherlands;National Research Institute for Mathematics and Computer Science (CWI), CWI, Science Park 123, 1098 XG Amsterdam, The Netherlands and University Amsterdam, Amsterdam, The Netherlands
Venue:
Pattern Recognition
Year:
2011

Citing 19
Cited 0

The ordinal quartet method

RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
A discipline of evolutionary programming

Theoretical Computer Science - Special issue on algorithmic learning theory
A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application

SIAM Journal on Computing
Quartet Cleaning: Improved Algorithms and Simulations

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs

Bioinformatics
Algorithmic Clustering of Music Based on String Compression

Computer Music Journal
Clustering Fetal Heart Rate Tracings by Compression

CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Analyzing worms and network traffic using compression

Journal of Computer Security
Clustered SplitsNetworks

COCOA 2008 Proceedings of the 2nd international conference on Combinatorial Optimization and Applications
An Introduction to Kolmogorov Complexity and Its Applications

An Introduction to Kolmogorov Complexity and Its Applications
Integer linear programming as a tool for constructing trees from quartet data

Computational Biology and Chemistry
Heuristic Approaches for the Quartet Method of Hierarchical Clustering

IEEE Transactions on Knowledge and Data Engineering
Information distance

IEEE Transactions on Information Theory
Shared information and program plagiarism detection

IEEE Transactions on Information Theory
The similarity metric

IEEE Transactions on Information Theory
Clustering by compression

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the 3(n4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill-climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized.