The complexity of gene placement
Journal of Algorithms
Concrete Math
Algorithms for Finding Gene Clusters
WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Finding All Common Intervals of k Permutations
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching
The Reconstruction of Doubled Genomes
SIAM Journal on Computing
Software note: Gene teams: a new formalization of gene clusters for comparative genomics
Computational Biology and Chemistry
The incompatible desiderata of gene cluster properties
RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
Power boosts for cluster tests
RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
Hi-index | 0.00 |
Identifying gene clusters, genomic regions that share local similarities in gene organization, is a prerequisite for many different types of genomic analyses, including operon prediction, reconstruction of chromosomal rearrangements, and detection of whole-genome duplications. A number of formal definitions of gene clusters have been proposed, as well as methods for finding such clusters and/or statistical tests for determining their significance. Unfortunately, there is very little overlap between previously published rigorous analytical statistical tests and the definitions used in practice. In this paper, we consider the max-gap cluster: a contiguous region containing a maximal set of homologs, where the number of non-homologous genes between pairs of adjacent homologs is never greater than a predefined, fixed parameter, g. Although this is one of the models most widely used in practice, currently the statistical significance of max-gap clusters can only be evaluated using Monte Carlo simulations because no analytical statistical tests have been developed for it. We give exact expressions for the probability of observing such a cluster by chance, assuming a simple reference-region scenario and random gene order, as well as more efficient methods for approximating this probability. We use these methods to identify which regions of the parameter space yield clusters that are statistically significant. Finally, we discuss some of the challenges in extending this model to whole-genome comparison.