Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining

Authors:
Mohammad El-Hajj;Osmar R. Zaïane
Affiliations:
University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 13
Cited 14

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Scalable Parallel Data Mining for Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining frequent item sets by opportunistic projection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Document Categorization by Term Association

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Association Analysis with One Scan of Databases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Mining Recurrent Items in Multimedia with Progressive Resolution Refinement

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

COFI approach for mining frequent itemsets revisited

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Index Support for Frequent Itemset Mining in a Relational DBMS

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Scrutinizing Frequent Pattern Discovery Performance

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Pattern lattice traversal by selective jumps

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Implementing leap traversals of the itemset lattice

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
ON DATA STRUCTURES FOR ASSOCIATION RULE DISCOVERY

Applied Artificial Intelligence
TCOM, an innovative data structure for mining association rules among infrequent items

Computers & Mathematics with Applications
FIUT: A new method for mining frequent itemsets

Information Sciences: an International Journal
A persistent HY-Tree to efficiently support itemset mining on large datasets

Proceedings of the 2010 ACM Symposium on Applied Computing
Efficient prime-based method for interactive mining of frequent patterns

Expert Systems with Applications: An International Journal
Programming relational databases for Itemset mining over large transactional tables

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
An efficient compression technique for frequent itemset generation in association rule mining

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Stream mining of frequent sets with limited memory

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: either the gigantic data structure built is assumed to fit in main memory, or the recursive mining process is too voracious in memory resources. Another major impediment is the repetitive and interactive nature of any knowledge discovery process. To tune parameters, many runs of the same algorithms are necessary leading to the building of these huge data structures time and again. This paper proposes a new disk-based association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding frequent patterns could be achieved in less than a full scan with random access. Second, for each frequent item, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. Experimental studies reveal that our Inverted Matrix approach outperform FP-Tree especially in mining very large transactional databases with a very large number of unique items. Our random access disk-based approach is particularly advantageous in a repetitive and interactive setting.