Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Scaling and related techniques for geometry problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Bioinformatics
Compressed indexing and local alignment of DNA
Bioinformatics
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Bioinformatics
Storage and Retrieval of Individual Genomes
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Bioinformatics
Indexing finite language representation of population genotypes
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Approximate all-pairs suffix/prefix overlaps
Information and Computation
Multi-pattern matching with bidirectional indexes
Journal of Discrete Algorithms
Hi-index | 0.00 |
Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.