Exploratory data analysis leading towards the most interesting simple association rules

  • Authors:
  • Alfonso Iodice D'Enza;Francesco Palumbo;Michael Greenacre

  • Affiliations:
  • Dipartimento di Matematica e Statistica, Universití di Napoli Federico II, Italy;Dipartimento di Istituzioni Economiche e Finanziarie, Universití di Macerata, Via Crescimbeni, 20 I-62100 Macerata, Italy;Department of Economics and Business, Universitat Pompeu Fabra, Barcelona, Spain

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

Association rules (AR) represent one of the most powerful and largely used approaches to detect the presence of regularities and paths in large databases. Rules express the relations (in terms of co-occurrence) between pairs of items and are defined in two measures: support and confidence. Most techniques for finding AR scan the whole data set, evaluate all possible rules and retain only rules that have support and confidence greater than thresholds, which should be fixed in order to avoid both that only trivial rules are retained and also that interesting rules are not discarded. A multistep approach aims to the identification of potentially interesting items exploiting well-known techniques of multidimensional data analysis. In particular, interesting pairs of items have a well-defined degree of association: an item pair is well defined if its degree of co-occurrence is very high with respect to one or more subsets of the considered set of transactions.