Designing Filters for Fast-Known NcRNA Identification

Authors:
Yanni Sun;Jeremy Buhler;Cheng Yuan
Affiliations:
Michigan State University, East Lansing;Washington University in Saint Louis, Saint Louis;Michigan State University, East Lansing
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2012

Citing 13
Cited 1

Classifying proteins by family using the product of correlated p-values

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Designing seeds for similarity search in genomic DNA

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Designing multiple simultaneous seeds for DNA similarity search

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Faster genome annotation of non-coding RNA families without loss of accuracy

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
FastR: Fast Database Search Tool for Non-Coding RNA

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Exploiting conserved structure for faster annotation of non-coding RNAs without loss of accuracy

Bioinformatics
Searching Genomes for Noncoding RNA Using FastR

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Sequence-based heuristics for faster annotation of non-coding RNA families

Bioinformatics
A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements

Bioinformatics
Designing Patterns and Profiles for Faster HMM Search

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Infernal 1.0

Bioinformatics
Structural rna homology search and alignment using covariance models

Structural rna homology search and alignment using covariance models
Optimal spaced seeds for hidden Markov models, with application to homologous coding regions

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

glu-RNA: aliGn highLy strUctured ncRNAs using only sequence similarity

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSP-based filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSP-based filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/