On a confidence gain measure for association rule discovery and scoring

Authors:
Raz Tamir;Yehuda Singer
Affiliations:
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel;Computer Studies Program, Extension of Derby University in Israel, Israel
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2006

Citing 11
Cited 4

Evaluation techniques for automatic semantic extraction: comparing syntactic and window based approaches

Corpus processing for lexical acquisition
A new and versatile method for association generation

Information Systems
Automatic personalization based on Web usage mining

Communications of the ACM
Small is beautiful: discovering the minimal set of unexpected patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluating the novelty of text-mined rules using lexical knowledge

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Scoring the Data Using Association Rules

Applied Intelligence
What Makes Patterns Interesting in Knowledge Discovery Systems

IEEE Transactions on Knowledge and Data Engineering
Simple association rules (SAR) and the SAR-based rule discovery

Computers and Industrial Engineering
Mining Surprising Patterns Using Temporal Description Length

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining

A Random Walk through Human Associations

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Correlation-based interestingness measure for video semantic concept detection

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Context Based Positive and Negative Spatio-Temporal Association Rule Mining

Knowledge-Based Systems
BruteSuppression: a size reduction method for Apriori rule sets

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

This article presents a new interestingness measure for association rules called confidence gain (CG). Focus is given to extraction of human associations rather than associations between market products. There are two main differences between the two (human and market associations). The first difference is the strong asymmetry of human associations (e.g., the association “shampoo” → “hair” is much stronger than “hair” → “shampoo”), where in market products asymmetry is less intuitive and less evident. The second is the background knowledge humans employ when presented with a stimulus (input phrase).CG calculates the local confidence of a given term compared to its average confidence throughout a given database. CG is found to outperform several association measures since it captures both the asymmetric notion of an association (as in the confidence measure) while adding the comparison to an expected confidence (as in the lift measure). The use of average confidence introduces the “background knowledge” notion into the CG measure.Various experiments have shown that CG and local confidence gain (a low-complexity version of CG) successfully generate association rules when compared to human free associations. The experiments include a large-scale “free sssociation Turing test” where human free associations were compared to associations generated by the CG and other association measures. Rules discovered by CG were found to be significantly better than those discovered by other measures.CG can be used for many purposes, such as personalization, sense disambiguation, query expansion, and improving classification performance of small item sets within large databases.Although CG was found to be useful for Internet data retrieval, results can be easily used over any type of database.