Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Finding motifs using random projections
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Fast algorithms for selecting specific siRNA in complete mRNA data
WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
Hi-index | 0.00 |
RNAi, short for RNA Interference, a phenomenon of inhibiting the expression of genes, is widely adopted in laboratories for the study of pathways and determination of gene function. Recent studies have shown that RNAi could be used as an approach to treat diseases like cancers and some genetic disorders in which the down-regulation of a protein could prevent or stop progression of the disease. In [7], the problem of detecting endogenous dsRNA control elements and their corresponding mRNA target, i.e., the gene under RNAi control by degradation, in complete genomes of species using a suffix tree data structure is discussed. While the algorithm identifies triple repeats in the genome sequence in linear time, its very high memory requirement (12 GB for the C. elegans genome of size 100 Mbp) becomes a bottleneck for processing genomes of higher order. In this paper, we give algorithms that are space and time efficient in practice than the suffix tree based algorithm. Our algorithms are based on simple array data structures and adopt basic sorting techniques to identify the desired patterns in a given genome sequence. We achieve a speedup of 23 and reduction in memory requirement by a factor of 12 for the C. elegans genome, over the suffix tree approach, making the processing of higher order genomes possible for detecting such endogenous controls and targets for RNAi by degradation.