Efficient String Mining under Constraints Via the Deferred Frequency Index

  • Authors:
  • David Weese;Marcel H. Schulz

  • Affiliations:
  • Department of Computer Science, Free University of Berlin, Berlin, Germany 14195;Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany and, International Max Planck Research School for Computational Biolo ...

  • Venue:
  • ICDM '08 Proceedings of the 8th industrial conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a general approach for frequency based string mining, which has many applications, e.g. in contrast data mining. Our contribution is a novel algorithm based on a deferred data structure. Despite its simplicity, our approach is up to 4 times faster and uses about half the memory compared to the best-known algorithm of Fischer et al. Applications in various string domains, e.g. natural language, DNA or protein sequences, demonstrate the improvement of our algorithm.