Mining interestingness measures for string pattern mining

Authors:
Manuel Baena-García;Rafael Morales-Bueno
Affiliations:
Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain;Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Year:
2010

Citing 14
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Knowledge Discovery and Measures of Interest

Knowledge Discovery and Measures of Interest
Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts

Machine Learning and Its Applications
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Selecting the right objective measure for association analysis

Information Systems - Knowledge discovery and data mining (KDD 2002)
An implementation of the FP-growth algorithm

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Quality Measures in Data Mining (Studies in Computational Intelligence)

Quality Measures in Data Mining (Studies in Computational Intelligence)
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter

String analysis by sliding positioning strategy

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a novel method to detect interesting patterns in strings. A common way to refine results of pattern mining algorithms is using interestingness measures. But the set of appropiate measures is different in each domain and problem. The aim of our research is to obtain a model that classify patterns by interest. The method is based on the application of machine learning algorithms to a generated dataset from factors features. Each dataset row is associated to a factor of a string and contains values of different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle which improves obtained classification results. The proposed method avoids the experts having to configure parameters in order to obtain interesting patterns. We demonstrated the utility of the method by giving example results on real data. The datasets and scripts to reproduce experiments are available on-line.