On compressing frequent patterns

Authors:
Dong Xin;Jiawei Han;Xifeng Yan;Hong Cheng
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Rm 2132, Siebel Center for Computer Science, 201 N. Goodwin Avenue, Urbana, IL 61801, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Rm 2132, Siebel Center for Computer Science, 201 N. Goodwin Avenue, Urbana, IL 61801, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Rm 2132, Siebel Center for Computer Science, 201 N. Goodwin Avenue, Urbana, IL 61801, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Rm 2132, Siebel Center for Computer Science, 201 N. Goodwin Avenue, Urbana, IL 61801, USA
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 14
Cited 10

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Scalable Techniques for Mining Causal Structures

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient Mining of Partial Periodic Patterns in Time Series Database

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Combinatorial optimization in system configuration design

Automation and Remote Control
Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets

Data & Knowledge Engineering
Depth first generation of frequent patterns without candidate generation

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Top-down and bottom-up strategies for incremental maintenance of frequent patterns

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Critical infrastructure protection: Resource efficient sampling to improve detection of less frequent patterns in network traffic

Journal of Network and Computer Applications
Functional proteomic pattern identification under low dose ionizing radiation

Artificial Intelligence in Medicine
"Tell me more": finding related items from user provided feedback

DS'11 Proceedings of the 14th international conference on Discovery science
Mining sequential patterns with extensible knowledge representation

Intelligent Data Analysis
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure @d (called @d-cluster), and select a representative pattern for each cluster. The problem of finding a minimum set of representative patterns is shown NP-Hard. We develop two greedy methods, RPglobal and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose [G. Grahne, J. Zhu, Efficiently using prefix-trees in mining frequent itemsets, in: Proc. IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03)], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.