Computing the longest common prefix array based on the Burrows-Wheeler transform

Authors:
Timo Beller;Simon Gog;Enno Ohlebusch;Thomas Schnattinger
Affiliations:
Institute of Theoretical Computer Science, University of Ulm, 89069 Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, 89069 Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, 89069 Ulm, Germany;Institute of Theoretical Computer Science, University of Ulm, 89069 Ulm, Germany
Venue:
Journal of Discrete Algorithms
Year:
2013

Citing 32
Cited 1

A note on the height of suffix trees

SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Automata and forbidden words

Information Processing Letters
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Fast BWT in small space by blockwise suffix sorting

Theoretical Computer Science
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Space-Time Tradeoffs for Longest-Common-Prefix Array Computation

ISAAC '08 Proceedings of the 19th International Symposium on Algorithms and Computation
Linear Suffix Array Construction by Almost Pure Induced-Sorting

DCC '09 Proceedings of the 2009 Data Compression Conference
Permuted Longest-Common-Prefix Array

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Fast and accurate short read alignment with Burrows–Wheeler transform

Bioinformatics
Range Quantile Queries: Another Virtue of Wavelet Trees

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compressed Suffix Arrays for Massive Data

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Directly Addressable Variable-Length Codes

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
High Throughput Short Read Alignment via Bi-directional BWT

BIBM '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Efficient computation of shortest absent words in a genomic sequence

Information Processing Letters
Efficient construction of an assembly string graph using the FM-index

Bioinformatics
Sampled longest common prefix array

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Top-k ranked document search in general text databases

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
CST++

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Computing matching statistics and maximal exact matches on compressed full-text indexes

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Lightweight BWT construction for very large string collections

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Fixed block compression boosting in FM-indexes

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Computing the longest common prefix array based on the burrows-wheeler transform

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Lightweight data indexing and compression in external memory

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
On the number of elements to reorder when updating a suffix array

Journal of Discrete Algorithms

Lightweight LCP construction for next-generation sequencing datasets

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many sequence analysis tasks can be accomplished with a suffix array, and several of them additionally need the longest common prefix array. In large scale applications, suffix arrays are being replaced with full-text indexes that are based on the Burrows-Wheeler transform. In this paper, we present the first algorithm that computes the longest common prefix array directly on the wavelet tree of the Burrows-Wheeler transformed string. It runs in linear time and a practical implementation requires approximately 2.2 bytes per character.