Matrix multiplication via arithmetic progressions
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)
SIAM Journal on Computing
A few logs suffice to build (almost) all trees (l): part I
Random Structures & Algorithms
Fast recovery of evolutionary trees with thousands of nodes
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Combining polynomial running time and fast convergence for the disk-covering method
Journal of Computer and System Sciences - Computational biology 2002
Algorithmica
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Large-scale neighbor-joining with NINJA
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Phylogenetic tree reconstruction with protein linkage
ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Hi-index | 5.23 |
Reconstructing the evolutionary history of a set of species is a fundamental problem in biology and methods for solving this problem are gaged based on two characteristics: accuracy and efficiency. Neighbor Joining (NJ) is a so-called distance-based method that, thanks to its good accuracy and speed, has been embraced by the phylogeny community. It takes the distances between n taxa and produces in @Q(n^3) time a phylogenetic tree, i.e., a tree which aims to describe the evolutionary history of the taxa. In addition to performing well in practice, the NJ algorithm has optimal reconstruction radius. The contribution of this paper is twofold: (1) we present an algorithm called Fast Neighbor Joining (FNJ) with optimal reconstruction radius and optimal run time complexity O(n^2) and (2) we present a greatly simplified proof for the correctness of NJ. Initial experiments show that FNJ in practice has almost the same accuracy as NJ, indicating that the property of optimal reconstruction radius has great importance to their good performance. Moreover, we show how improved running time can be achieved for computing the so-called correction formulas.