Clustering based distributed phylogenetic tree construction

  • Authors:
  • Esra Ruzgar;Kayhan Erciyes

  • Affiliations:
  • Computer Eng. Dept., Izmir University, Gursel Aksel Bulvari, 14, Uckuyular 35350, Izmir, Turkey;Computer Eng. Dept., Izmir University, Gursel Aksel Bulvari, 14, Uckuyular 35350, Izmir, Turkey

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

Phylogenetic tree construction has received much attention recently due to the availability of vast biological data. In this study, we provide a three step method to build phylogenetic trees. Firstly, a density-based clustering algorithm is used to provide clusters of the population at hand using the distance matrix which shows the distances of the species. Secondly, a phylogenetic tree for each cluster is constructed by using the neighbor-joining (NJ) algorithm and finally, the roots of the small phylogenetic trees are connected again by the NJ algorithm to form one large phylogenetic tree. To our knowledge, this is the first method for building phylogenetic trees that uses clustering prior to forming the tree. As such, it provides independent phylogenetic tree formation within each cluster as the second step, hence is suitable for parallel/distributed processing, enabling fast processing of very large biological data sets. The proposed method, clustered neighbor-joining (CNJ) is applied to 145 samples from the Y-DNA Haplogroup G. Distances between male samples are the variation in their set of Y-chromosomal short tandem repeat (STR) values. We show that the clustering method we use is superior to other clustering methods as applied to Y-DNA data and also independent, fast distributed construction of phylogenetic trees is possible with this method.