Efficient colossal pattern mining in high dimensional datasets

  • Authors:
  • Mohammad Karim Sohrabi;Ahmad Abdollahzadeh Barforoush

  • Affiliations:
  • ISLAB, Computer Engineering & IT Department, Amirkabir University of Technology, 424 Hafez Ave., Tehran 15914, Iran;ISLAB, Computer Engineering & IT Department, Amirkabir University of Technology, 424 Hafez Ave., Tehran 15914, Iran

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

'Frequent pattern mining' is considered as an important data mining problem which has been extensively studied over the last decade. There are a large number of algorithms which have been developed for frequent pattern mining on a traditional commercial dataset which usually contains a huge number of transactions besides a small number of items in each transaction. The advent of bioinformatics contributed to the development of new form of datasets - called high dimensional - which are characterized by small number of transactions and large number of items in each transaction. The running time of traditional algorithms increases exponentially with increasing average transaction length, thus these algorithms cannot be suitable for the high dimensional datasets. On the other hand, the mining algorithms on high dimensional datasets create a very large output set as result which includes small and mid-size frequent patterns which do not bear any useful information for scientists. Colossal pattern mining is described as a solution to reduce the amount of output set of mining patterns. Due to ignoring the mining of the small and mid-sized patterns, mining process speed is increased in colossal patterns mining algorithms. Therefore, only very large (colossal) patterns are extracted and mined in this approach. In this paper we represent an efficient vertical bottom up method to conduct mining of frequent colossal patterns in high dimensional datasets. In our algorithm, we use a bit matrix to compress the dataset and make it easy to use in mining process. Our experimental result shows that our algorithm attains very good mining efficiencies on various input datasets. Furthermore, our performance study shows that this algorithm outperforms substantially the best former algorithms.