Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Membership in Constant Time and Almost-Minimum Space
SIAM Journal on Computing
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Cache-oblivious string B-trees
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ACM Computing Surveys (CSUR)
Linear work suffix array construction
Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Hi-index | 0.00 |
The talk is a guided tour on text indexing data structures, suffix sorting, and data compression. We discuss how they share common problems on text suffixes, showing the interplay among some of the algorithmic techniques that have been devised so far. In the following, given a text T = T [1,n ] of n symbols, we denote by s i its suffix s i = T [i ,n ] for 1 ≤ i ≤ n . A text indexing data structure stores the suffixes s 1 , s 2 , ..., s n of T at preprocessing time, in a suitable format that can support pattern matching queries over T . For example, given a pattern string P of m symbols, one type of query is that of computing how many times P appears in T , whose O (m + logn ) time complexity in the comparison model compares favorably with the O (m + n ) cost required by full text scanning [8]. Notable examples of text indexing data structures are suffix trees [10,14] and suffix arrays [9] for usage in main memory, string Btrees [4] and cache-oblivious string B-trees [1] for usage in external and hierarchical memory, to name a few. Suffix sorting requires to arrange the suffixes s 1 , s 2 , ..., s n in lexicographic order. This is the major computational bottleneck in suffix-based algorithms, and can be solved in O (n logn ) time in the comparison model (e.g. [7]). Having sorted the suffixes, it is not difficult to build a text indexing data structure in (nearly) linear time. Suffix sorting is crucial also in data compression, as witnessed by the importance of the Burrows-Wheeler transform [3]. The techniques adopted in the aforementioned topics converged in several ways into the rich fields of compressed text indexing [5,6,11,13] and succinct data structures [2,12], with some old and new open problems.