Mining frequent itemsets in large data warehouses: a novel approach proposed for sparse data sets

Authors:
S. M. Fakhrahmad;M. Zolghadri Jahromi;M. H. Sadreddini
Affiliations:
Faculty member in Department of Computer Eng., Islamic Azad University of Shiraz and Shiraz University, Shiraz, Iran;Department of Computer Science & Engineering, Shiraz University, Shiraz, Iran;Department of Computer Science & Engineering, Shiraz University, Shiraz, Iran
Venue:
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Year:
2007

Citing 7
Cited 0

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient search for association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Proposing efficient techniques for discovery of useful information and valuable knowledge from very large databases and data warehouses has attracted the attention of many researchers in the field of data mining. The well-known Association Rule Mining (ARM) algorithm, Apriori, searches for frequent itemsets (i.e., set of items with an acceptable support) by scanning the whole database repeatedly to count the frequency of each candidate itemset. Most of the methods proposed to improve the efficiency of the Apriori algorithm attempt to count the frequency of each itemset without re-scanning the database. However, these methods rarely propose any solution to reduce the complexity of the inevitable enumerations that are inherited within the problem. In this paper, we propose a new algorithm for mining frequent itemsets and also association rules. The algorithm computes the frequency of itemsets in an efficient manner. Only a single scan of the database is required in this algorithm. The data is encoded into a compressed form and stored in main memory within a suitable data structure. The proposed algorithm works in an iterative manner, and in each iteration, the time required to measure the frequency of an itemset is reduced further (i.e., checking the frequency of n-dimensional candidate itemsets is much faster than those of n-1 dimensions). The efficiency of our algorithm is evaluated using artificial and real-life datasets. Experimental results indicate that our algorithm is more efficient than existing algorithms.