Class imbalance methods for translation initiation site recognition

Authors:
Nicolás García-Pedrajas;Domingo Ortiz-Boyer;María D. García-Pedrajas;Colin Fyfe
Affiliations:
Department of Computing and Numerical Analysis, University of Córdoba, Spain;Department of Computing and Numerical Analysis, University of Córdoba, Spain;Experimental Station La Mayora, CSIC, Málaga, Spain;School of Computing, University of the West of Scotland, United Kingdom
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Year:
2010

Citing 10
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
New Support Vector Algorithms

Neural Computation
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Translation initiation site prediction on a genomic scale

Bioinformatics
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy

Evolutionary Computation
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Feature selection for translation initiation site recognition

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Translation initiation site recognition by means of evolutionary response surfaces

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Translation initiation sites (TIS) recognition is one of the first steps in gene structure prediction, and one of the common components in any gene recognition system. Many methods have been described in the literature to identify TIS in transcripts such as mRNA, EST and cDNA sequences. However, the recognition of TIS in DNA sequences is a far more challenging task, and the methods described so far for transcripts achieve poor results in DNA sequences. Most methods approach this problem taking into account its biological features. In this work we try a different view, considering this classification problem from a purely machine learning perspective. From the point of view of machine learning, TIS recognition is a class imbalance problem. Thus, in this paper we approach TIS recognition from this angle, and apply the different methods that have been developed to deal with imbalance datasets. Results show an advantage of class imbalance methods with respect to the same methods applied without considering the class imbalance nature of the problem. The applied methods are also able to improve the results obtained with the best method in the literature, which is based on looking for the next in-frame stop codon from the putative TIS that must be predicted.