Space and time efficient algorithms to discover endogenous RNAi patterns in complete genome data

  • Authors:
  • Sudha Balla;Sanguthevar Rajasekaran

  • Affiliations:
  • University of Connecticut, Storrs, CT;University of Connecticut, Storrs, CT

  • Venue:
  • ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

RNAi, short for RNA Interference, a phenomenon of inhibiting the expression of genes, is widely adopted in laboratories for the study of pathways and determination of gene function. Recent studies have shown that RNAi could be used as an approach to treat diseases like cancers and some genetic disorders in which the down-regulation of a protein could prevent or stop progression of the disease. In [7], the problem of detecting endogenous dsRNA control elements and their corresponding mRNA target, i.e., the gene under RNAi control by degradation, in complete genomes of species using a suffix tree data structure is discussed. While the algorithm identifies triple repeats in the genome sequence in linear time, its very high memory requirement (12 GB for the C. elegans genome of size 100 Mbp) becomes a bottleneck for processing genomes of higher order. In this paper, we give algorithms that are space and time efficient in practice than the suffix tree based algorithm. Our algorithms are based on simple array data structures and adopt basic sorting techniques to identify the desired patterns in a given genome sequence. We achieve a speedup of 23 and reduction in memory requirement by a factor of 12 for the C. elegans genome, over the suffix tree approach, making the processing of higher order genomes possible for detecting such endogenous controls and targets for RNAi by degradation.