Computational identification of evolutionarily conserved exons
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
An Evolutionary Study of the Human Papillomavirus Genomes
RECOMB-CG '08 Proceedings of the international workshop on Comparative Genomics
Hi-index | 0.00 |
So far, most methods for identifying sequences under selection based on comparative sequence data have either assumed selectional pressures are the same across all branches of a phylogeny, or have focused on changes in specific lineages of interest. Here, we introduce a more general method that detects sequences that have either come under selection, or begun to drift, on any lineage. The method is based on a phylogenetic hidden Markov model (phylo-HMM), and does not require element boundaries to be determined a priori, making it particularly useful for identifying noncoding sequences. Insertions and deletions (indels) are incorporated into the phylo-HMM by a simple strategy that uses a separately reconstructed “indel history.” To evaluate the statistical significance of predictions, we introduce a novel method for computing P-values based on prior and posterior distributions of the number of substitutions that have occurred in the evolution of predicted elements. We derive efficient dynamic-programming algorithms for obtaining these distributions, given a model of neutral evolution. Our methods have been implemented as computer programs called DLESS (Detection of LinEage-Specific Selection) and phyloP (phylogenetic P-values). We discuss results obtained with these programs on both real and simulated data sets.