Carpenter: finding closed patterns in long biological datasets

Authors:
Feng Pan;Gao Cong;Anthony K. H. Tung;Jiong Yang;Mohammed J. Zaki
Affiliations:
Natl. University of Singapore;Natl. University of Singapore;Natl. University of Singapore;University of Illinois, Urbana, Champaign;Rensselaer Polytechnic Institute
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 6
Cited 47

Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

FARMER: finding interesting rule groups in microarray datasets

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Advances in frequent itemset mining implementations: report on FIMI'03

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
Relative risk and odds ratio: a data mining perspective

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining top-K covering rule groups for gene expression data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mining closed relational graphs with connectivity constraints

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On discovery of maximal confident rules without support pruning in microarray data

Proceedings of the 5th international workshop on Bioinformatics
Frequent closed itemset based algorithms: a thorough structural and analytical survey

ACM SIGKDD Explorations Newsletter
Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent closed cubes in 3D datasets

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

Information Systems
Mining association rules in very large clustered domains

Information Systems
The role mining problem: finding a minimal descriptive set of roles

Proceedings of the 12th ACM symposium on Access control models and technologies
High Confidence Rule Mining for Microarray Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
CSV: visualizing and mining cohesive subgraphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Are zero-suppressed binary decision diagrams good for mining frequent patterns in high dimensional datasets?

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
CARSVM: A class association rule-based classification framework and its application to gene expression data

Artificial Intelligence in Medicine
A new concise representation of frequent itemsets using generators and a positive border

Knowledge and Information Systems
Closed patterns meet n-ary relations

ACM Transactions on Knowledge Discovery from Data (TKDD)
Top-down mining of frequent closed patterns from very high dimensional data

Information Sciences: an International Journal
A framework for mining top-k frequent closed itemsets using order preserving generators

Proceedings of the 2nd Bangalore Annual Compute Conference
Multi-level Frequent Pattern Mining

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Edge-RMP: Minimizing administrative assignments for role-based access control

Journal of Computer Security
Minimum description length principle: generators are preferable to closed patterns

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Mining Discriminant Sequential Patterns for Aging Brain

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Mining High-Correlation Association Rules for Inferring Gene Regulation Networks

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Predicting protein-protein interactions using numerical associational features

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Efficient mining under rich constraints derived from various datasets

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
On approximating minimum infrequent and maximum frequent sets

DS'07 Proceedings of the 10th international conference on Discovery science
The role mining problem: A formal perspective

ACM Transactions on Information and System Security (TISSEC)
Cohesion: A concept and framework for confident association discovery with potential application in microarray mining

Applied Soft Computing
New approach for the sequential pattern mining of high-dimensional sequence databases

Decision Support Systems
Finding closed frequent item sets by intersecting transactions

Proceedings of the 14th International Conference on Extending Database Technology
Classifying microarray data with association rules

Proceedings of the 2011 ACM Symposium on Applied Computing
Database transposition for constrained (closed) pattern mining

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
TP+close: mining frequent closed patterns in gene expression datasets

VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Local pattern discovery in Array-CGH data

LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Biologically relevant association rules for classification of microarray data

ACM SIGAPP Applied Computing Review
Efficient colossal pattern mining in high dimensional datasets

Knowledge-Based Systems
Contrast mining from interesting subgroups

Bisociative Knowledge Discovery
Frequent item set mining

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)
Closed and noise-tolerant patterns in n-ary relations

Data Mining and Knowledge Discovery
DisClose: discovering colossal closed itemsets via a memory efficient compact row-tree

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
An efficient and scalable algorithm for mining maximal

MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
A new method for mining disjunctive emerging patterns in high-dimensional datasets using hypergraphs

Information Systems
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain 10,000-100,000 columns but only 100-1000 rows.Such datasets pose a great challenge for existing (closed) frequent pattern discovery algorithms, since they have an exponential dependence on the average row length. In this paper, we describe a new algorithm called CARPENTER that is specially designed to handle datasets having a large number of attributes and relatively small number of rows. Several experiments on real bioinformatics datasets show that CARPENTER is orders of magnitude better than previous closed pattern mining algorithms like CLOSET and CHARM.