Mining interestingness measures for string pattern mining

  • Authors:
  • M. Baena-Garcıa;R. Morales-Bueno

  • Affiliations:
  • Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, 29071 Málaga, Spain;Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, 29071 Málaga, Spain

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A novel method of detecting interesting patterns in strings is presented. A common way to refine the results of pattern mining algorithms is by using interestingness measures. However, the set of appropriate measures differs for each domain and problem. The aim of our research was to develop a model with which to classify patterns according to their interestingness. The method is based on the application of machine learning algorithms to a dataset generated from factor features. Each dataset row is associated with a factor of a string and contains values for different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle, which improves the classification results obtained. With the proposed method, experts need not configure the parameters to obtain interesting patterns. We demonstrate the utility of the method by presenting an example of the results for real data. The datasets and scripts required to reproduce the experiments are available on-line.