Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Better Filtering with Gapped q-Grams
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Efficient q-gram filters for finding all ε-matches over a given length
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Discrete Applied Mathematics
Hi-index | 0.00 |
Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the solution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two strings or occurring twice in the same string. In this paper, we present an algorithm called Nimbus for filtering strings prior to finding repetitions occurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with Nimbus a data set where one wants to find functional elements using a multiple local alignment tool such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.