Scoring and ranking the data using association rules

Authors:
Bing Liu;Yiming Ma;Ching Kian Wong
Affiliations:
School of Computing, National University of Singapore, 3 Since Drive 2, Singapore 117543;School of Computing, National University of Singapore, 3 Since Drive 2, Singapore 117543;School of Computing, National University of Singapore, 3 Since Drive 2, Singapore 117543
Venue:
Data mining, rough sets and granular computing
Year:
2002

Citing 17
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
BOAT—optimistic decision tree construction

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Extending naïve Bayes classifiers using long itemsets

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
SLIQ: A Fast Scalable Classifier for Data Mining

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Online Generation of Association Rules

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many data mining applications, the objective is to find the likelihood that an object belongs to a particular class. For example, in direct marketing, marketers want to know how likely a potential customer will buy a particular product. In such applications, it is often too difficult to predict who will definitely be buyers and non-buyers because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of assigning a definite class (e.g., buyer or non-buyer) to a data case representing a potential customer, a classification system is made to produce a class probability estimate (or a score) for the data case. However, existing classification systems only aim to find a small subset of rules that exist in data to form a classifier. This small subset of rules can only give a partial (or biased) picture of the domain. In this paper, we show that association rule mining provides a more powerful solution to the problem because association rule mining aims to generate all rules in data. It is thus able to give a complete picture of the underlying relationships that exist in the domain. This complete set of rules enables us to assign a more accurate class probability estimate (or likelihood) to each (new) data case. An efficient technique that makes use of the discovered association rules to produce class probability estimates is proposed. We call this technique scoring based on associations (or SBA). Experiment results on both public domain data and our real-life application data show that the technique performs significantly better than the state-of-the-art classification system C4.5.