Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%

  • Authors:
  • Jakob Hull Havgaard;Rune B. Lyngsø;Gary D. Stormo;Jan Gorodkin

  • Affiliations:
  • Center for Bioinformatics and Division of Genetics, IBHV, The Royal Veterinary and Agricultural University Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark;Department of Statistics, Oxford University 1 South Parks Road, Oxford, OX1 3TG, UK;Department of Genetics, Washington University School of Medicine Campus Box 8232, 4566 Scott Avenue, St Louis, MO 63110, USA;Center for Bioinformatics and Division of Genetics, IBHV, The Royal Veterinary and Agricultural University Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding today as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. Results: Here we present such an approach for pairwise local alignment which is based on foldalign and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new foldalign implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity foldalign is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. Availability: The program is available online at http://foldalign.kvl.dk/ Contact: gorodkin@bioinf.kvl.dk