Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Computing suffix links for suffix trees and arrays
Information Processing Letters
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Optimal string mining under frequency constraints
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
A Space Efficient Solution to the Frequent String Mining Problem for Many Databases
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Distributed string mining for high-throughput sequencing data
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Sequential pattern mining from trajectory data
Proceedings of the 17th International Database Engineering & Applications Symposium
String analysis by sliding positioning strategy
Journal of Computer and System Sciences
Hi-index | 0.00 |
The frequent string mining problem is to find all substrings of a collection of string databases which satisfy database specific minimum and maximum frequency constraints. Our contribution improves the existing linear-time algorithm for this problem in such a way that the peak memory consumption is a constant factor of the size of the largest database of strings. We show how the results for each database can be stored implicitly in space proportional to the size of the database, making it possible to traverse the results in lexicographical order. Furthermore, we present a linear-time algorithm which calculates the intersection of the results of different databases. This algorithm is based on an algorithm to merge two suffix arrays, and our modification allows us to also calculate the LCP table of the resulting suffix array during the merging.