PSoL: a positive sample only learning algorithm for finding non-coding RNA genes

Authors:
Chunlin Wang;Chris Ding;Richard F. Meraz;Stephen R. Holbrook
Affiliations:
Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA;Computational Research Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA;Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA;Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 7

A partially supervised classification approach to dominant and recessive human disease gene prediction

Computer Methods and Programs in Biomedicine
Learning Bayesian classifiers from positive and unlabeled examples

Pattern Recognition Letters
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Find Relevant Biological Articles without Negative Training Examples

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Novel H/ACA Box snoRNA Mining and Secondary Structure Prediction Algorithms

RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Semi-supervised learning with very few labeled training examples

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Prediction of small non-coding RNA in bacterial genomes using support vector machines

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Small non-coding RNA (ncRNA) genes play important regulatory roles in a variety of cellular processes. However, detection of ncRNA genes is a great challenge to both experimental and computational approaches. In this study, we describe a new approach called positive sample only learning (PSoL) to predict ncRNA genes in the Escherichia coli genome. Although PSoL is a machine learning method for classification, it requires no negative training data, which, in general, is hard to define properly and affects the performance of machine learning dramatically. In addition, using the support vector machine (SVM) as the core learning algorithm, PSoL can integrate many different kinds of information to improve the accuracy of prediction. Besides the application of PSoL for predicting ncRNAs, PSoL is applicable to many other bioinformatics problems as well. Results: The PSoL method is assessed by 5-fold cross-validation experiments which show that PSoL can achieve about 80% accuracy in recovery of known ncRNAs. We compared PSoL predictions with five previously published results. The PSoL method has the highest percentage of predictions overlapping with those from other methods. Contact: srholbrook@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online.