TP+close: mining frequent closed patterns in gene expression datasets

Authors:
YuQing Miao;GuoLiang Chen;Bin Song;ZhiHao Wang
Affiliations:
Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China;Department of Computer Science, Case Western Reserve University, Cleveland;Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Venue:
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Year:
2006

Citing 7
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using transposition for pattern discovery from microarray data

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Carpenter: finding closed patterns in long biological datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining Frequent Closed Patterns in Microarray Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unlike the traditional datasets, gene expression datasets typically contain a huge number of items and few transactions. Though there were a large number of algorithms that had been developed for mining frequent closed patterns, their running time increased exponentially with the average length of the transactions increasing. Therefore, most current methods for high-dimensional gene expression datasets were impractical. In this paper, we proposed a new data structure, tidset-prefix-plus tree (TP+-tree), to store the compressed transposed table of dataset. Based on TP+-tree, an algorithm, TP+close, was developed for mining frequent closed patterns in gene expression datasets. TP+close adopted top-down and divide-and-conquer search strategies on the transaction space. Moreover, TP+close combined efficient pruning and effective optimizing methods. Several experiments on real-life gene expression datasets showed that TP+close was faster than RERII and CARPENTER, two existing algorithms.