Perfect Hashing Schemes for Mining Association Rules

Authors:
Chin-Chen Chang;Chih-Yang Lin
Affiliations:
Department of Computer Science and Information Engineering, National Chung Cheng University, Chaiyi, Taiwan 621, Republic of China;Department of Computer Science and Information Engineering, National Chung Cheng University, Chaiyi, Taiwan 621, Republic of China
Venue:
The Computer Journal
Year:
2005

Citing 0
Cited 8

Combined association rules for dealing with missing values

Journal of Information Science
External perfect hashing for very large key sets

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Isolated items discarding strategy for discovering high utility itemsets

Data & Knowledge Engineering
Distributed perfect hashing for very large key sets

Proceedings of the 3rd international conference on Scalable information systems
Two-phase algorithms for a novel utility-frequent mining model

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Effective utility mining with the measure of average utility

Expert Systems with Applications: An International Journal
A new mining approach for uncertain databases using CUFP trees

Expert Systems with Applications: An International Journal
Practical perfect hashing in nearly optimal space

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hashing schemes are widely used to improve the performance of data mining association rules, as in the DHP algorithm that utilizes the hash table in identifying the validity of candidate itemsets according to the number of the table's bucket accesses. However, since the hash table used in DHP is plagued by the collision problem, the process of generating large itemsets at each level requires two database scans, which leads to poor performance. In this paper we propose perfect hashing schemes to avoid collisions in the hash table. The main idea is to employ a refined encoding scheme, which transforms large itemsets into large 2-itemsets and thereby makes the application of perfect hashing feasible. Our experimental results demonstrate that the new method is also efficient (about three times faster than DHP), and scalable when the database size increases. We also propose another variant of the perfect hash scheme with reduced memory requirements. The properties and performances of several perfect hashing schemes are also investigated and compared.