Space Efficient String Mining under Frequency Constraints

Authors:
Johannes Fischer;Veli Mäkinen;Niki Välimäki
Affiliations:
-;-;-
Venue:
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Year:
2008

Citing 0
Cited 7

Mining class-correlated patterns for sequence labeling

DS'10 Proceedings of the 13th international conference on Discovery science
Mining interestingness measures for string pattern mining

Knowledge-Based Systems
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Distributed string mining for high-throughput sequencing data

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
On (dynamic) range minimum queries in external memory

WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
String analysis by sliding positioning strategy

Journal of Computer and System Sciences
Multi-pattern matching with bidirectional indexes

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let $\db_1$ and $\db_2$ be two databases (i.e. multisets) of $d$ strings, over an alphabet $\Sigma$, with overall length $n$. We study the problem of mining discriminative patterns between $\db_1$ and $\db_2$ --- e.g., patterns that are frequent in one database but not in the other, emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is $O(n \log n)$ bits, which is not optimal for $|\Sigma