Weighted association rule mining via a graph based connectivity model

  • Authors:
  • Russel Pears;Yun Sing Koh;Gillian Dobbie;Wai Yeap

  • Affiliations:
  • School of Computing and Mathematical Sciences, Private Bag 92006, Auckland 1142, New Zealand;Department of Computer Science, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand;Department of Computer Science, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand;School of Computing and Mathematical Sciences, Private Bag 92006, Auckland 1142, New Zealand

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Association rule mining is an important data mining task that discovers relationships among items in a transaction database. Classical association rule mining approaches make the implicit assumption that an item's importance is determined by its support. In contrast, Weighted Association Rule Mining (WARM) attempts to provide a notion of importance, or weight to individual items that are not based solely on item support. Previous approaches to Weighted Association Rule Mining assign item weights in a subjective manner, based on a user's specialized knowledge of the underlying domain that is involved. Such approaches are infeasible when millions of items are present in a dataset, or when domain knowledge is unavailable. Furthermore, even when such domain information is available, a weight assignment based on subjective information constrains the knowledge discovered to fit with the weights assigned, thus inhibiting the discovery of new trends in the data. In this research we automate the process of weight assignment by formulating a linear model that captures relationships between items. This approach extends prior research based on the Valency model. We extend the Valency model by expanding the field of interaction beyond immediate neighborhoods and show that this leads to significant improvements in performance on a number of different metrics that we use.