Linear Time Suffix Array Construction Using D-Critical Substrings
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Efficiently extracting operational profiles from execution logs using suffix arrays
ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
Hi-index | 0.00 |
We propose a novel fast and space efficient linear suffix array construction algorithm (SACA) to break the performance and design bottlenecks for the existing linear SACAs. By sampling the fixed-size d-critical substrings to divide-and-conquer the problem, our new algorithm is very simple, for which a fully-functioning sample implementation is embodied in only about 100 lines of C code. The experimental results on the Canterbury and Manzini-Ferragina corpora show that our algorithm outperforms boththe K\"arkk\"ainen-Sanders (KS) and the Ko-Aluru (KA) algorithms: compared with the KS, ours can be more than twice faster and use more than 50% fewer space; compared with the KA, ours can be 9% faster and use 40% fewer space. To approach the lightweight space extreme, we further improve our linear algorithm to use an extra working space of only 0.25n+O(1) bytes to construct the suffix array for any size-n string of a constant or integer alphabet,where the characters of an integer alphabet are in [0..n-1]. Besides using less space, our lightweight linear algorithm stil lruns more than 1.5 times faster than the KS algorithm in the experiments.