Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Discovering characteristic expressions in literary works
Theoretical Computer Science
A Corpus for the Evaluation of Lossless Compression Algorithms
DCC '97 Proceedings of the Conference on Data Compression
DCC '99 Proceedings of the Conference on Data Compression
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
On-line construction of compact directed acyclic word graphs
Discrete Applied Mathematics
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Unsupervised spam detection based on string alienness measures
DS'07 Proceedings of the 10th international conference on Discovery science
Minimum Unique Substrings and Maximum Repeats
Fundamenta Informaticae - Theory that Counts: To Oscar Ibarra on His 70th Birthday
Computing regularities in strings: A survey
European Journal of Combinatorics
Space-Efficient computation of maximal and supermaximal repeats in genome sequences
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
This paper considers enumeration of substring equivalence classes introduced by Blumer et al. [1]. They used the equivalence classes to define an index structure called compact directed acyclic word graphs (CDAWGs). In text analysis, considering these equivalence classes is useful since they group together redundant substrings with essentially identical occurrences. In this paper, we present how to enumerate those equivalence classes using suffix arrays. Our algorithm uses rank and lcp arrays for traversing the corresponding suffix trees, but does not need any other additional data structure. The algorithm runs in linear time in the length of the input string. We show experimental results comparing the running times and space consumptions of our algorithm, suffix tree and CDAWG based approaches.