Fibonacci heaps and their uses in improved network optimization algorithms
Journal of the ACM (JACM)
Smallest-last ordering and clustering and graph coloring algorithms
Journal of the ACM (JACM)
On bipartite and multipartite clique problems
Journal of Algorithms
Introduction to Algorithms
Whole-genome comparative annotation and regulatory motif discovery in multiple yeast species
RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Massive Quasi-Clique Detection
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
On mining cross-graph quasi-cliques
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Combinatorial optimization in system configuration design
Automation and Remote Control
Hi-index | 0.00 |
We present a method for automatically extracting groups of orthologous genes from a large set of genomes by a new clustering algorithm on a weighted multipartite graph. The method assigns a score to an arbitrary subset of genes from multiple genomes to assess the orthologous relationships between genes in the subset. This score is computed using sequence similarities between the member genes and the phylogenetic relationship between the corresponding genomes. An ortholog cluster is found as the subset with the highest score, so ortholog clustering is formulated as a combinatorial optimization problem. The algorithm for finding an ortholog cluster runs in time O(|E|+|V| log|V|), where V and E are the sets of vertices and edges, respectively, in the graph. However, if we discretize the similarity scores into a constant number of bins, the runtime improves to O(|E|+|V|). The proposed method was applied to seven complete eukaryote genomes on which the manually curated database of eukaryotic ortholog clusters, KOG, is constructed. A comparison of our results with the manually curated ortholog clusters shows that our clusters are well correlated with the existing clusters.