Efficient colossal pattern mining in high dimensional datasets

Authors:
Mohammad Karim Sohrabi;Ahmad Abdollahzadeh Barforoush
Affiliations:
ISLAB, Computer Engineering & IT Department, Amirkabir University of Technology, 424 Hafez Ave., Tehran 15914, Iran;ISLAB, Computer Engineering & IT Department, Amirkabir University of Technology, 424 Hafez Ave., Tehran 15914, Iran
Venue:
Knowledge-Based Systems
Year:
2012

Citing 18
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Finding Association Rules Using Fast Bit Computation: Machine-Oriented Modeling

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Mining frequent item sets by opportunistic projection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining lossless closed frequent patterns with weight constraints

Knowledge-Based Systems
BitTableFI: An efficient mining frequent itemsets algorithm

Knowledge-Based Systems
Index-BitTableFI: An improved algorithm for mining frequent itemsets

Knowledge-Based Systems
CBAR: an efficient method for mining association rules

Knowledge-Based Systems
Finding closed frequent item sets by intersecting transactions

Proceedings of the 14th International Conference on Extending Database Technology
Memory-efficient frequent-itemset mining

Proceedings of the 14th International Conference on Extending Database Technology

Parallel frequent itemset mining using systolic arrays

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

'Frequent pattern mining' is considered as an important data mining problem which has been extensively studied over the last decade. There are a large number of algorithms which have been developed for frequent pattern mining on a traditional commercial dataset which usually contains a huge number of transactions besides a small number of items in each transaction. The advent of bioinformatics contributed to the development of new form of datasets - called high dimensional - which are characterized by small number of transactions and large number of items in each transaction. The running time of traditional algorithms increases exponentially with increasing average transaction length, thus these algorithms cannot be suitable for the high dimensional datasets. On the other hand, the mining algorithms on high dimensional datasets create a very large output set as result which includes small and mid-size frequent patterns which do not bear any useful information for scientists. Colossal pattern mining is described as a solution to reduce the amount of output set of mining patterns. Due to ignoring the mining of the small and mid-sized patterns, mining process speed is increased in colossal patterns mining algorithms. Therefore, only very large (colossal) patterns are extracted and mined in this approach. In this paper we represent an efficient vertical bottom up method to conduct mining of frequent colossal patterns in high dimensional datasets. In our algorithm, we use a bit matrix to compress the dataset and make it easy to use in mining process. Our experimental result shows that our algorithm attains very good mining efficiencies on various input datasets. Furthermore, our performance study shows that this algorithm outperforms substantially the best former algorithms.