Splice Site Recognition in DNA Sequences Using K-mer Frequency Based Mapping for Support Vector Machine with Power Series Kernel

  • Authors:
  • Robertas Damaševicius

  • Affiliations:
  • -

  • Venue:
  • CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments is splice-junction (intron-exon or exon-intron) sites. Detection of splice-junction sites in DNA sequences is important for successful gene prediction. In this paper, Support Vector Machine (SVM) is used for classification of DNA sequences and splice-site recognition. For optimal classification, four position-independent k-mer frequency based methods for mapping DNA sequences into SVM feature space are analyzed. Classification is performed using SVM power series kernels. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. Precision of classification is evaluated using F-measure, which is a combination of precision and recall metrics. Best classification results are achieved using 4-mers for exon-intron dataset (78%) and 6-mers for intron-exon dataset (70%) using 4-nucleotide frequencies.