RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
A discipline of evolutionary programming
Theoretical Computer Science - Special issue on algorithmic learning theory
Quartet Cleaning: Improved Algorithms and Simulations
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithmic Clustering of Music Based on String Compression
Computer Music Journal
Clustering Fetal Heart Rate Tracings by Compression
CBMS '06 Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Analyzing worms and network traffic using compression
Journal of Computer Security
COCOA 2008 Proceedings of the 2nd international conference on Combinatorial Optimization and Applications
An Introduction to Kolmogorov Complexity and Its Applications
An Introduction to Kolmogorov Complexity and Its Applications
Integer linear programming as a tool for constructing trees from quartet data
Computational Biology and Chemistry
Heuristic Approaches for the Quartet Method of Hierarchical Clustering
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Information Theory
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.01 |
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the 3(n4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill-climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized.