Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
EXTRA: a system for example-based translation assistance
Machine Translation
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
An efficient algorithm for finding gene-specific probes for DNA microarrays
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Data analysis and bioinformatics
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Hi-index | 0.00 |
Approximate string matching on large DNA sequencesdata is very important in bioinformatics. Some studies haveshown that suffix tree is an efficient data structure for approximate string matching. It performs better than suffixarray if the data structure can be stored entirely in the memory. However, our study find that suffix array is much better than suffix tree for indexing the DNA sequences sincethe data structure has to be created and stored on the diskdue to its size. We propose a novel auxiliary data structurewhich greatly improves the efficiency of suffix array in theapproximate string matching problem in the external memory model. The second problem we have tackled is the parallel approximate matching in DNA sequence. We propose2 novel parallel algorithms for this problem and implementthem on a PC cluster. The result shows that when the errorallowed is small, a direct partitionin of the array over themachines in the cluster is a more efficient approach. On theother hand, when the error allowed is large, partitioningthe data over the machines is a better approach.