High efficiency on prediction of translation initiation site (TIS) of RefSeq sequences

  • Authors:
  • Cristiane N. Nobre;J. Miguel Ortega;Antônio de Pádua Braga

  • Affiliations:
  • Bioinformática, UFMG;Laboratório de Biodados, ICB, UFMG;Engenharia Eletrônica, UFMG

  • Venue:
  • BSB'07 Proceedings of the 2nd Brazilian conference on Advances in bioinformatics and computational biology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

An important task in the area of gene discovery is the correct prediction of the translation initiation site (TIS). The TIS can correspond to the first AUG, but this is not always the case. This task can be modeled as a classification problem between positive (TIS) and negative patterns. Here we have used Support Vector Machine working with data processed by the class balancing method called Smote (Synthetic Minority Over-sampling Technique). Smote was used because the average imbalance has a positive/negative pattern ratio of around 1:28 for the databases used in this work. As a result we have attained accuracy, precision, sensitivity and specificity values of 99% on average.