Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Optimal parallel suffix tree construction
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
Genome-scale disk-based suffix tree indexing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Theoretical Computer Science
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detection of subtle variations as consensus motifs
Theoretical Computer Science
Indexing genomic sequences on the IBM Blue Gene
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
VARUN: Discovering Extensible Motifs under Saturation Constraints
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Suffix trees for inputs larger than main memory
Information Systems
ERA: efficient serial and parallel suffix tree construction for very long strings
Proceedings of the VLDB Endowment
Bridging lossy and lossless compression by motif pattern discovery
General Theory of Information Transfer and Combinatorics
Whole-Genome Phylogeny by Virtue of Unic Subwords
DEXA '12 Proceedings of the 2012 23rd International Workshop on Database and Expert Systems Applications
Hi-index | 0.00 |
The construction of suffix tree for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become everyday more complex, requiring fast queries to multiple genomes. In this paper we presented Parallel Continuous Flow PCF, a parallel suffix tree construction method that is suitable for very long strings. We tested our method on the construction of suffix tree of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input string grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the Human genome in 7 minutes using 172 nodes.