Clustering based distributed phylogenetic tree construction

Authors:
Esra Ruzgar;Kayhan Erciyes
Affiliations:
Computer Eng. Dept., Izmir University, Gursel Aksel Bulvari, 14, Uckuyular 35350, Izmir, Turkey;Computer Eng. Dept., Izmir University, Gursel Aksel Bulvari, 14, Uckuyular 35350, Izmir, Turkey
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 7
Cited 2

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Fuzzy clustering based on k-nearest-neighbours rule

Fuzzy Sets and Systems - Special issue on clustering and learning
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Robustness of density-based clustering methods with various neighborhood relations

Fuzzy Sets and Systems
Fuzzy C-means and fuzzy swarm for fuzzy clustering problem

Expert Systems with Applications: An International Journal
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families

Expert Systems with Applications: An International Journal
Fuzzy and crisp clustering methods based on the neighborhood concept: A comprehensive review

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - FUZZYSS'2011: 2nd International Fuzzy Systems Symposium

Quantified Score

Hi-index	12.05

Visualization

Abstract

Phylogenetic tree construction has received much attention recently due to the availability of vast biological data. In this study, we provide a three step method to build phylogenetic trees. Firstly, a density-based clustering algorithm is used to provide clusters of the population at hand using the distance matrix which shows the distances of the species. Secondly, a phylogenetic tree for each cluster is constructed by using the neighbor-joining (NJ) algorithm and finally, the roots of the small phylogenetic trees are connected again by the NJ algorithm to form one large phylogenetic tree. To our knowledge, this is the first method for building phylogenetic trees that uses clustering prior to forming the tree. As such, it provides independent phylogenetic tree formation within each cluster as the second step, hence is suitable for parallel/distributed processing, enabling fast processing of very large biological data sets. The proposed method, clustered neighbor-joining (CNJ) is applied to 145 samples from the Y-DNA Haplogroup G. Distances between male samples are the variation in their set of Y-chromosomal short tandem repeat (STR) values. We show that the clustering method we use is superior to other clustering methods as applied to Y-DNA data and also independent, fast distributed construction of phylogenetic trees is possible with this method.