Mining interestingness measures for string pattern mining

Authors:
M. Baena-Garcıa;R. Morales-Bueno
Affiliations:
Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, 29071 Málaga, Spain;Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, 29071 Málaga, Spain
Venue:
Knowledge-Based Systems
Year:
2012

Citing 21
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge Discovery and Measures of Interest

Knowledge Discovery and Measures of Interest
Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts

Machine Learning and Its Applications
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Selecting the right objective measure for association analysis

Information Systems - Knowledge discovery and data mining (KDD 2002)
Scalable sequential pattern mining for biological sequences

Proceedings of the thirteenth ACM international conference on Information and knowledge management
An implementation of the FP-growth algorithm

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Space Efficient String Mining under Frequency Constraints

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

Information Sciences: an International Journal
Data mining for exploring hidden patterns between KM and its performance

Knowledge-Based Systems
Analysis on repeat-buying patterns

Knowledge-Based Systems
Approximate weighted frequent pattern mining with/without noisy environments

Knowledge-Based Systems

An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

Knowledge-Based Systems
Confirmation measures of association rule interestingness

Knowledge-Based Systems
String analysis by sliding positioning strategy

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel method of detecting interesting patterns in strings is presented. A common way to refine the results of pattern mining algorithms is by using interestingness measures. However, the set of appropriate measures differs for each domain and problem. The aim of our research was to develop a model with which to classify patterns according to their interestingness. The method is based on the application of machine learning algorithms to a dataset generated from factor features. Each dataset row is associated with a factor of a string and contains values for different interestingness measures and contextual information. We also propose a new interestingness measure based on an entropy principle, which improves the classification results obtained. With the proposed method, experts need not configure the parameters to obtain interesting patterns. We demonstrate the utility of the method by presenting an example of the results for real data. The datasets and scripts required to reproduce the experiments are available on-line.