Combining multi-species genomic data for microRNA identification using a Naïve Bayes classifier

Authors:
Malik Yousef;Michael Nebozhyn;Hagit Shatkay;Stathis Kanterakis;Louise C. Showe;Michael K. Showe
Affiliations:
The Wistar Institute, Philadelphia PA 19104, USA;The Wistar Institute, Philadelphia PA 19104, USA;School of Computing, Queen's University Kingston, Ontario, Canada;The Wistar Institute, Philadelphia PA 19104, USA;The Wistar Institute, Philadelphia PA 19104, USA;The Wistar Institute, Philadelphia PA 19104, USA
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 7

Machine learning method for knowledge discovery experimented with otoneurological data

Computer Methods and Programs in Biomedicine
Brief communication: Genome-wide computational identification of microRNAs and their targets in the deep-branching eukaryote Giardialamblia

Computational Biology and Chemistry
MicroRNAs and cancer-the search begins!

IEEE Transactions on Information Technology in Biomedicine
PMirP: A pre-microRNA prediction method based on structure-sequence hybrid features

Artificial Intelligence in Medicine
In silico prediction of noncoding RNAs using supervised learning and feature ranking methods

International Journal of Bioinformatics Research and Applications
In silico prediction of noncoding RNAs using supervised learning and feature ranking methods

International Journal of Bioinformatics Research and Applications
Prediction of pre-miRNA with multiple stem-loops using pruning algorithm

Computers in Biology and Medicine

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naïve Bayes classifier. It automatically generates a model from the training data, which consists of sequence and structure information of known miRNAs from a variety of species. Results: Our study shows that the application of machine learning techniques, along with the integration of data from multiple species is a useful and general approach for miRNA gene prediction. Based on our experiments, we believe that this new technique is applicable to an extensive range of eukaryotes' genomes. Specific structure and sequence features are first used to identify miRNAs followed by a comparative analysis to decrease the number of false positives (FPs). The resulting algorithm exhibits higher specificity and similar sensitivity compared to currently used algorithms that rely on conserved genomic regions to decrease the rate of FPs. Availability: The BayesMiRNAfind program is available at http://wotan.wistar.upenn.edu/miRNA Contact: showe@wistar.org Supplementary information: Supplementary data are available at Bioinformatics online.