Prediction of splice sites with dependency graphs and their expanded bayesian networks

Authors:
Te-Ming Chen;Chung-Chin Lu;Wen-Hsiung Li
Affiliations:
Department of Electrical Engineering, National Tsing Hua University Hsinchu 30013, Taiwan;Department of Electrical Engineering, National Tsing Hua University Hsinchu 30013, Taiwan;Department of Ecology and Evolution, University of Chicago Chicago, IL 60637, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 8

Biological Sequence Data Preprocessing for Classification: A Case Study in Splice Site Identification

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Brief communication: Genomic splice site prediction algorithm based on nucleotide sequence pattern for RNA viruses

Computational Biology and Chemistry
Splice sites prediction of Human genome using length-variable Markov model and feature selection

Expert Systems with Applications: An International Journal
SpliceIT: A hybrid method for splice signal identification based on probabilistic and biological inference

Journal of Biomedical Informatics
Brief communication: Classification of splice-junction sequences via weighted position specific scoring approach

Computational Biology and Chemistry
Pattern recognition in bioinformatics: an introduction

PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
Comparative gene prediction based on gene structure conservation

PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
A new classification method for human gene splice site prediction

HIS'12 Proceedings of the First international conference on Health Information Science

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Owing to the complete sequencing of human and many other genomes, huge amounts of DNA sequence data have been accumulated. In bioinformatics, an important issue is how to predict the complete structure of genes from the genomic DNA sequence, especially the human genome. A crucial part in the gene structure prediction is to determine the precise exon--intron boundaries, i.e. the splice sites, in the coding region. Results: We have developed a dependency graph model to fully capture the intrinsic interdependency between base positions in a splice site. The establishment of dependency between two position is based on a χ2-test from known sample data. To facilitate statistical inference, we have expanded the dependency graph (which is usually a graph with cycles that make probabilistic reasoning very difficult, if not impossible) into a Bayesian network (which is a directed acyclic graph that facilitates statistical reasoning). When compared with the existing models such as weight matrix model, weight array model, maximal dependence decomposition, Cai et al.'s tree model as well as the less-studied second-order and third-order Markov chain models, the expanded Bayesian networks from our dependency graph models perform the best in nearly all the cases studied. Availability: Software (a program called DGSplicer) and datasets used are available at http://csrl.ee.nthu.edu.tw/bioinf/ Contact: cclu@ee.nthu.edu.tw