ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Computational Biology and Chemistry
Splice sites prediction of Human genome using length-variable Markov model and feature selection
Expert Systems with Applications: An International Journal
Journal of Biomedical Informatics
Computational Biology and Chemistry
Pattern recognition in bioinformatics: an introduction
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Comparative gene prediction based on gene structure conservation
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
A new classification method for human gene splice site prediction
HIS'12 Proceedings of the First international conference on Health Information Science
Hi-index | 3.84 |
Motivation: Owing to the complete sequencing of human and many other genomes, huge amounts of DNA sequence data have been accumulated. In bioinformatics, an important issue is how to predict the complete structure of genes from the genomic DNA sequence, especially the human genome. A crucial part in the gene structure prediction is to determine the precise exon--intron boundaries, i.e. the splice sites, in the coding region. Results: We have developed a dependency graph model to fully capture the intrinsic interdependency between base positions in a splice site. The establishment of dependency between two position is based on a χ2-test from known sample data. To facilitate statistical inference, we have expanded the dependency graph (which is usually a graph with cycles that make probabilistic reasoning very difficult, if not impossible) into a Bayesian network (which is a directed acyclic graph that facilitates statistical reasoning). When compared with the existing models such as weight matrix model, weight array model, maximal dependence decomposition, Cai et al.'s tree model as well as the less-studied second-order and third-order Markov chain models, the expanded Bayesian networks from our dependency graph models perform the best in nearly all the cases studied. Availability: Software (a program called DGSplicer) and datasets used are available at http://csrl.ee.nthu.edu.tw/bioinf/ Contact: cclu@ee.nthu.edu.tw