An integrated, generic approach to pattern mining: data mining template library

  • Authors:
  • Vineet Chaoji;Mohammad Al Hasan;Saeed Salem;Mohammed J. Zaki

  • Affiliations:
  • Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180;Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180;Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180;Computer Science Department, Rensselaer Polytechnic Institute, Troy, USA 12180

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software ( http://dmtl.sourceforge.net/ ), and has been downloaded by numerous researchers from all over the world.