On weighted multiway cuts in trees
Mathematical Programming: Series A and B
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Pattern Discovery in Bioinformatics: Theory & Algorithms
Pattern Discovery in Bioinformatics: Theory & Algorithms
Computation of median gene clusters
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
The incompatible desiderata of gene cluster properties
RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
Minimal Conflicting Sets for the Consecutive Ones Property in Ancestral Genome Reconstruction
RECOMB-CG '09 Proceedings of the International Workshop on Comparative Genomics
Consistency of sequence-based gene clusters
RECOMB-CG'10 Proceedings of the 2010 international conference on Comparative genomics
A polynomial-time algorithm for finding a minimal conflicting set containing a given row
CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
Faster and simpler minimal conflicting set identification
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Rearrangement-Based Phylogeny Using the Single-Cut-or-Join Operation
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
The order of genes in genomes provides extensive information. In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to identify gene clusters—groups of genes that are colocated in a set of genomes. We introduce a unified approach to model gene clusters and define the problem of labeling the inner nodes of a given phylogenetic tree with sets of gene clusters. Our optimization criterion in this context combines two properties: parsimony, i.e., the number of gains and losses of gene clusters has to be minimal, and consistency, i.e., for each ancestral node, there must exist at least one potential gene order that contains all the reconstructed clusters. We present and evaluate an exact algorithm to solve this problem. Despite its exponential worst-case time complexity, our method is suitable even for large-scale data. We show the effectiveness and efficiency on both simulated and real data.