Discovering frequent itemsets by support approximation and itemset clustering

Authors:
Kuen-Fang Jea;Ming-Yuan Chang
Affiliations:
Department of Computer Science, National Chung-Hsing University, Taichung 40227, Taiwan, ROC;Department of Computer Science, National Chung-Hsing University, Taichung 40227, Taiwan, ROC
Venue:
Data & Knowledge Engineering
Year:
2008

Citing 26
Cited 2

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An efficient approach to discovering knowledge from large databases

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval

Information Retrieval
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
A New Approach to Online Generation of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set

IEEE Transactions on Knowledge and Data Engineering
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
An efficient cluster and decomposition algorithm for mining association rules

Information Sciences—Informatics and Computer Science: An International Journal
Efficient Disk-Based K-Means Clustering for Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Iterative Projected Clustering by Subspace Mining

IEEE Transactions on Knowledge and Data Engineering
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

IEEE Transactions on Knowledge and Data Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
An efficient and flexible algorithm for online mining of large itemsets

Information Processing Letters
CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A database clustering methodology and tool

Information Sciences—Informatics and Computer Science: An International Journal
Non-Almost-Derivable Frequent Itemsets Mining

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
Mining Approximate Frequent Itemsets from Noisy Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A constraint-based genetic algorithm approach for mining classification rules

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

An efficient algorithm for mining closed inter-transaction itemsets

Data & Knowledge Engineering
An improved association rules mining method

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

To speed up the task of association rule mining, a novel concept based on support approximation has been previously proposed for generating frequent itemsets. However, the mining technique utilized by this concept may incur unstable accuracy due to approximation error. To overcome this drawback, in this paper we combine a new clustering method with support approximation, and propose a mining method, namely CAC, to discover frequent itemsets based on the Principle of Inclusion and Exclusion. The clustering technique groups highly similar members to improve the accuracy of support approximation. The hit ratio analysis and experimental results presented in this paper verify that CAC improves accuracy. Without repeatedly scanning a database and storing vast information in memory, the CAC method is able mine frequent itemsets with relative stability. The advantages that the CAC method enjoys in both accuracy and performance make it an effective and useful technique for discovering frequent itemsets in a database.