Scoring and ranking the data using association rules

  • Authors:
  • Bing Liu;Yiming Ma;Ching Kian Wong

  • Affiliations:
  • School of Computing, National University of Singapore, 3 Since Drive 2, Singapore 117543;School of Computing, National University of Singapore, 3 Since Drive 2, Singapore 117543;School of Computing, National University of Singapore, 3 Since Drive 2, Singapore 117543

  • Venue:
  • Data mining, rough sets and granular computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many data mining applications, the objective is to find the likelihood that an object belongs to a particular class. For example, in direct marketing, marketers want to know how likely a potential customer will buy a particular product. In such applications, it is often too difficult to predict who will definitely be buyers and non-buyers because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of assigning a definite class (e.g., buyer or non-buyer) to a data case representing a potential customer, a classification system is made to produce a class probability estimate (or a score) for the data case. However, existing classification systems only aim to find a small subset of rules that exist in data to form a classifier. This small subset of rules can only give a partial (or biased) picture of the domain. In this paper, we show that association rule mining provides a more powerful solution to the problem because association rule mining aims to generate all rules in data. It is thus able to give a complete picture of the underlying relationships that exist in the domain. This complete set of rules enables us to assign a more accurate class probability estimate (or likelihood) to each (new) data case. An efficient technique that makes use of the discovered association rules to produce class probability estimates is proposed. We call this technique scoring based on associations (or SBA). Experiment results on both public domain data and our real-life application data show that the technique performs significantly better than the state-of-the-art classification system C4.5.