New indices for text: PAT Trees and PAT arrays
Information retrieval
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
Burrows--Wheeler transform and Sturmian words
Information Processing Letters
UbiCrawler: a scalable fully distributed web crawler
Software—Practice & Experience
ACM Computing Surveys (CSUR)
Linear work suffix array construction
Journal of the ACM (JACM)
The engineering of a compression boosting library: theory vs practice in BWT compression
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Theoretical Computer Science
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Geometric Burrows-Wheeler Transform: Linking Range Searching and Text Indexing
DCC '08 Proceedings of the Data Compression Conference
Better external memory suffix array construction
Journal of Experimental Algorithmics (JEA)
On the Value of Multiple Read/Write Streams for Data Compression
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
SIAM Journal on Computing
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
I/O efficient algorithms for serial and parallel suffix tree construction
ACM Transactions on Database Systems (TODS)
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Medium-space algorithms for inverse BWT
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Lightweight BWT construction for very large string collections
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Grammar-based compression in a streaming model
LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
Lightweight LCP construction for next-generation sequencing datasets
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Computing the longest common prefix array based on the Burrows-Wheeler transform
Journal of Discrete Algorithms
Lightweight algorithms for constructing and inverting the BWT of string collections
Theoretical Computer Science
Trends in suffix sorting: a survey of low memory algorithms
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Hi-index | 0.00 |
In this paper we describe algorithms for computing the BWT and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of disk working space while all previous approaches use Θ(n logn) bits of disk working space. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the BWT that uses Θ(n) bits of working space, and a lightweight internal-memory algorithm for computing the BWT which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the BWT via sequential scans in terms of the classic product: internal-memory space × number of passes over the disk data, showing that our algorithms are within an O(logn) factor of the optimal.