Citing 0
Cited 2

Biological Sequence Data Preprocessing for Classification: A Case Study in Splice Site Identification

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
A new classification method for human gene splice site prediction

HIS'12 Proceedings of the First international conference on Health Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wide variety of biologically relevant signals are embedded within DNA sequences. Splice sites are one class of signals which determine the junctions between coding and non-coding regions of DNA, so splice site detection is a critical step in computational gene recognition. However, predicting splice sites is a challenging classification problem, mainly due to the overwhelming abundance of pseudo-sites and consequently, the relatively small number of positive examples. Models of splice signals based on complex feature spaces may be useful for increased recognition accuracy; however, the relative lack of data can pose a problem for many statistical learning methods in high dimensional feature spaces. We present a learning approach to donor and acceptor splice site prediction based on the SNoW architecture - a sparse network of classifiers implementing a variant of the multiplicative weight-update algorithm, Winnow, which is known to tolerate high dimensional feature spaces and to behave robustly in the presence of irrelevant or features. These two attributes, which enable a SNoW network to incorporate many different feature types, motivated our attempt to create a SNoW-based splice site predictor, where an assortment of features based on the local sequence context were used. Accuracy evaluation on several benchmark test sets of human genes indicates that SNoW-based splice site predictors compare favorably with other programs based on local sequence features. SNoW-based learning may be useful for other biological signal prediction tasks.

Splice Site Prediction Using a Sparse Network of Winnows

Quantified Score

Visualization

Abstract