Splice Site Prediction Based on Characteristic of Sequential Motifs and C4.5 Algorithm

  • Authors:
  • Hequan Sun;Qinke Peng;Quanwei Zhang;Dan Mou

  • Affiliations:
  • -;-;-;-

  • Venue:
  • FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Through statistic analysis on the donor site sequences in the dataset of HS3D, the rules that the bases appear in the adjacent sites around the splice sites are used for constructing motifs, which are then utilized as the attributes of the DNA sequences. And by setting the value of each attribute the literal sequences are transformed into quasi numeric vectors, based on which a decision tree (C4.5 Algorithm) model is built to predict splice sites. The experimental results indicate that compared with the improved Maisheng Yin’s motif-scoring model, the proposed method has diminished the influence on the prediction generated by the abnormal data effectively and shows that the new encoding method in virtue of motifs is practicable and effectual.