Mining class-correlated patterns for sequence labeling
DS'10 Proceedings of the 13th international conference on Discovery science
Mining interestingness measures for string pattern mining
Knowledge-Based Systems
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Distributed string mining for high-throughput sequencing data
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
On (dynamic) range minimum queries in external memory
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
String analysis by sliding positioning strategy
Journal of Computer and System Sciences
Multi-pattern matching with bidirectional indexes
Journal of Discrete Algorithms
Hi-index | 0.00 |
Let $\db_1$ and $\db_2$ be two databases (i.e. multisets) of $d$ strings, over an alphabet $\Sigma$, with overall length $n$. We study the problem of mining discriminative patterns between $\db_1$ and $\db_2$ --- e.g., patterns that are frequent in one database but not in the other, emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is $O(n \log n)$ bits, which is not optimal for $|\Sigma