The 2-Interval Pattern Matching Problems and Its Application to ncRNA Scanning
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
RNA Search with Decision Trees and Partial Covariance Models
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient alignment of RNAs with pseudoknots using sequence alignment constraints
EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Hardware-Accelerated RNA Secondary-Structure Alignment
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Designing Filters for Fast-Known NcRNA Identification
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Homology search with fragmented nucleic acid sequence patterns
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Accelerating ncRNA homology search with FPGAs
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 3.84 |
Motivation: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. Results: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that---unlike family-specific solutions---can scale to hundreds of ncRNA families. Availability: The source code is available under GNU Public License at the supplementary web site. Contact: zasha@cs.washington.edu Supplementary information:http://bio.cs.washington.edu/supplements/zasha-HeurHmm-2004/ (Technical details, results, C++ code)