On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)
SIAM Journal on Computing
Rank aggregation methods for the Web
Proceedings of the 10th international conference on World Wide Web
Machine Learning
A tight bound on approximating arbitrary metrics by tree metrics
Journal of Computer and System Sciences - Special issue: STOC 2003
Fitting tree metrics: Hierarchical clustering and Phylogeny
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Clustering with qualitative information
Journal of Computer and System Sciences - Special issue: Learning theory 2003
ACM Transactions on Knowledge Discovery from Data (TKDD)
Aggregating inconsistent information: Ranking and clustering
Journal of the ACM (JACM)
Deterministic Pivoting Algorithms for Constrained Ranking and Clustering Problems
Mathematics of Operations Research
Δ additive and Δ ultra-additive maps, Gromov's trees, and the Farris transform
Discrete Applied Mathematics
Approximating the best-fit tree under Lp norms
APPROX'05/RANDOM'05 Proceedings of the 8th international workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th international conference on Randamization and Computation: algorithms and techniques
Hi-index | 0.00 |
Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e., some measure of the difference between the tree metric and the given data). This problem arises in constructing an $M$-level hierarchical clustering of objects (or an ultrametric on objects) so as to match the given dissimilarity data—a basic problem in statistics. Viewed in this way, the problem is a generalization of the correlation clustering problem (which corresponds to $M=1$). We give a very simple randomized combinatorial algorithm for the $M$-level hierarchical clustering problem that achieves an approximation ratio of $M+2$. This is a generalization of a previous factor 3 algorithm for correlation clustering on complete graphs. The problem of fitting tree metrics also arises in phylogeny where the objective is to learn the evolution tree by fitting a tree to dissimilarity data on taxa. The quality of the fit is measured by taking the $\ell_p$ norm of the difference between the tree metric constructed and the given data. Previous results obtained a factor 3 approximation for finding the closest tree metric under the $\ell_\infty$ norm. No nontrivial approximation for general $\ell_p$ norms was known before. We present a novel linear program formulation for this problem and obtain an $O((\log n \log \log n)^{1/p})$-approximation to the closest ultrametric under the $\ell_p$ norm using this. Our techniques are based on representing and viewing an ultrametric as a hierarchy of clusterings and may be useful in other contexts.