Mining interestingness measures for string pattern mining

  • Authors:
  • Manuel Baena-García;Rafael Morales-Bueno

  • Affiliations:
  • Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain;Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain

  • Venue:
  • IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a novel method to detect interesting patterns in strings. A common way to refine results of pattern mining algorithms is using interestingness measures. But the set of appropiate measures is different in each domain and problem. The aim of our research is to obtain a model that classify patterns by interest. The method is based on the application of machine learning algorithms to a generated dataset from factors features. Each dataset row is associated to a factor of a string and contains values of different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle which improves obtained classification results. The proposed method avoids the experts having to configure parameters in order to obtain interesting patterns. We demonstrated the utility of the method by giving example results on real data. The datasets and scripts to reproduce experiments are available on-line.