Practical parallel union-find algorithms for transitive closure and clustering
International Journal of Parallel Programming
The distribution of subword counts is usually normal
European Journal of Combinatorics
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Efficiency of a Good But Not Linear Set Union Algorithm
Journal of the ACM (JACM)
Faster algorithms for string matching with k mismatches
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Exact and Efficient Computation of the Expected Number of Missing and Common Words in Random Texts
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A multimedia data base browsing system
Proceedings of the 1st international workshop on Computer vision meets databases
Hi-index | 0.00 |
We present a fast algorithm for sequence clustering and searching which works with large sequence datab ases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works withproteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.