A comparison of imperative and purely functional suffix tree constructions
ESOP '94 Selected papers of ESOP '94, the 5th European symposium on Programming
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Reducing the space requirement of suffix trees
Software—Practice & Experience
Data Mining Techniques: For Marketing, Sales, and Customer Support
Data Mining Techniques: For Marketing, Sales, and Customer Support
Making Use of the Most Expressive Jumping Emerging Patterns for Classification
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
A Theory of Inductive Query Answering
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Fast Frequent String Mining Using Suffix Arrays
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Looking for monotonicity properties of a similarity constraint on sequences
Proceedings of the 2006 ACM symposium on Applied computing
Mining minimal distinguishing subsequence patterns with gap constraints
Knowledge and Information Systems
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
An efficient algorithm for mining string databases under constraints
KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Optimal string mining under frequency constraints
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Theoretical and practical improvements on the RMQ-Problem, with applications to LCA and LCE
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
We propose a general approach for frequency based string mining, which has many applications, e.g. in contrast data mining. Our contribution is a novel algorithm based on a deferred data structure. Despite its simplicity, our approach is up to 4 times faster and uses about half the memory compared to the best-known algorithm of Fischer et al. Applications in various string domains, e.g. natural language, DNA or protein sequences, demonstrate the improvement of our algorithm.