A high recall DNA splice site prediction based on association analysis
ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science
Hi-index | 0.00 |
Through statistic analysis on the donor site sequences in the dataset of HS3D, the rules that the bases appear in the adjacent sites around the splice sites are used for constructing motifs, which are then utilized as the attributes of the DNA sequences. And by setting the value of each attribute the literal sequences are transformed into quasi numeric vectors, based on which a decision tree (C4.5 Algorithm) model is built to predict splice sites. The experimental results indicate that compared with the improved Maisheng Yin’s motif-scoring model, the proposed method has diminished the influence on the prediction generated by the abnormal data effectively and shows that the new encoding method in virtue of motifs is practicable and effectual.