Large-scale neighbor-joining with NINJA

Authors:
Travis J. Wheeler
Affiliations:
Department of Computer Science, The University of Arizona, Tucson, AZ
Venue:
WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Year:
2009

Citing 7
Cited 2

Introduction to Algorithms

Introduction to Algorithms
Latency lags bandwith

Communications of the ACM - Voting systems
On the Uniqueness of the Selection Criterion in Neighbor-Joining

Journal of Classification
QuickJoin---fast neighbour-joining tree reconstruction

Bioinformatics
Clearcut: a fast implementation of relaxed neighbor joining

Bioinformatics
Fast neighbor joining

Theoretical Computer Science
Accelerating the neighbor-joining algorithm using the adaptive bucket data structure

ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications

Towards a practical O(n log n) phylogeny algorithm

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
A survey on latent tree models and applications

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Neighbor-joining is a well-established hierarchical clustering algorithm for inferring phylogenies. It begins with observed distances between pairs of sequences, and clustering order depends on a metric related to those distances. The canonical algorithm requires O(n3) time and O(n2) space for n sequences, which precludes application to very large sequence families, e.g. those containing 100,000 sequences. Datasets of this size are available today, and such phylogenies will play an increasingly important role in comparative biology studies. Recent algorithmic advances have greatly sped up neighbor-joining for inputs of thousands of sequences, but are limited to fewer than 13,000 sequences on a system with 4GB RAM. In this paper, I describe an algorithm that speeds up neighbor-joining by dramatically reducing the number of distance values that are viewed in each iteration of the clustering procedure, while still computing a correct neighbor-joining tree. This algorithm can scale to inputs larger than 100,000 sequences because of external-memory-efficient data structures. A free implementation may by obtained from http://nimbletwist.com/software/ninja