TP+close: mining frequent closed patterns in gene expression datasets

  • Authors:
  • YuQing Miao;GuoLiang Chen;Bin Song;ZhiHao Wang

  • Affiliations:
  • Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science, Case Western Reserve University, Cleveland;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China

  • Venue:
  • VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unlike the traditional datasets, gene expression datasets typically contain a huge number of items and few transactions. Though there were a large number of algorithms that had been developed for mining frequent closed patterns, their running time increased exponentially with the average length of the transactions increasing. Therefore, most current methods for high-dimensional gene expression datasets were impractical. In this paper, we proposed a new data structure, tidset-prefix-plus tree (TP+-tree), to store the compressed transposed table of dataset. Based on TP+-tree, an algorithm, TP+close, was developed for mining frequent closed patterns in gene expression datasets. TP+close adopted top-down and divide-and-conquer search strategies on the transaction space. Moreover, TP+close combined efficient pruning and effective optimizing methods. Several experiments on real-life gene expression datasets showed that TP+close was faster than RERII and CARPENTER, two existing algorithms.