Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bioinformatics
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
Unified view of backward backtracking in short read mapping
Algorithms and Applications
RCSI: scalable similarity search in thousand(s) of genomes
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We propose a way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account. This is achieved through converting a multiple alignment of individual genomes into a finite automaton recognizing all strings that can be read from the alignment by switching the sequence at any time. The finite automaton is indexed with an extension of Burrows-Wheeler transform to allow pattern search inside the plausible recombinant sequences. The size of the index stays limited, because of the high similarity of individual genomes. The index finds applications in variation calling and in primer design.