Fast Frequent String Mining Using Suffix Arrays

Authors:
Johannes Fischer;Volker Heun;Stefan Kramer
Affiliations:
Universität München;Universität München;TU München
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 8
Cited 2

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Theory of Inductive Query Answering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Engineering a Lightweight Suffix Array Construction Algorithm

Algorithmica
Linear work suffix array construction

Journal of the ACM (JACM)
An efficient algorithm for mining string databases under constraints

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases

Efficient String Mining under Constraints Via the Deferred Frequency Index

ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
Optimal string mining under frequency constraints

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method to mine strings that are frequent in one database and infrequent in another. The method uses suffix- and lcp-arrays that can be computed extremely fast and space efficiently, and further exhibit a good locality behavior. Experiments with several biologically relevant data sets show that our approach outperforms existing methods in terms of time and space.