Computer Methods and Programs in Biomedicine
Learning Bayesian classifiers from positive and unlabeled examples
Pattern Recognition Letters
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Find Relevant Biological Articles without Negative Training Examples
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Novel H/ACA Box snoRNA Mining and Secondary Structure Prediction Algorithms
RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Semi-supervised learning with very few labeled training examples
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Prediction of small non-coding RNA in bacterial genomes using support vector machines
Expert Systems with Applications: An International Journal
Hi-index | 3.84 |
Motivation: Small non-coding RNA (ncRNA) genes play important regulatory roles in a variety of cellular processes. However, detection of ncRNA genes is a great challenge to both experimental and computational approaches. In this study, we describe a new approach called positive sample only learning (PSoL) to predict ncRNA genes in the Escherichia coli genome. Although PSoL is a machine learning method for classification, it requires no negative training data, which, in general, is hard to define properly and affects the performance of machine learning dramatically. In addition, using the support vector machine (SVM) as the core learning algorithm, PSoL can integrate many different kinds of information to improve the accuracy of prediction. Besides the application of PSoL for predicting ncRNAs, PSoL is applicable to many other bioinformatics problems as well. Results: The PSoL method is assessed by 5-fold cross-validation experiments which show that PSoL can achieve about 80% accuracy in recovery of known ncRNAs. We compared PSoL predictions with five previously published results. The PSoL method has the highest percentage of predictions overlapping with those from other methods. Contact: srholbrook@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online.