Splice site detection in DNA sequences using a fast classification algorithm
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A high recall DNA splice site prediction based on association analysis
ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science
Hi-index | 0.00 |
Recognition of specific functionally-important DNA sequence fragments is considered one of the most important problems in bioinformatics. One type of such fragments is splice-junction (intron-exon or exon-intron) sites. Detection of splice-junction sites in DNA sequences is important for successful gene prediction. In this paper, Support Vector Machine (SVM) is used for classification of DNA sequences and splice-site recognition. For optimal classification, four position-independent k-mer frequency based methods for mapping DNA sequences into SVM feature space are analyzed. Classification is performed using SVM power series kernels. Kernel parameters are optimized using a modification of the Nelder-Mead (downhill simplex) optimization method. Precision of classification is evaluated using F-measure, which is a combination of precision and recall metrics. Best classification results are achieved using 4-mers for exon-intron dataset (78%) and 6-mers for intron-exon dataset (70%) using 4-nucleotide frequencies.