Mining compressed frequent-pattern sets

Authors:
Dong Xin;Jiawei Han;Xifeng Yan;Hong Cheng
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 14
Cited 53

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Scalable Techniques for Mining Causal Structures

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Generating semantic annotations for frequent patterns with context analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Center-piece subgraphs: problem definition and fast solutions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting redundancy-aware top-k patterns

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering interesting patterns through user's interactive feedback

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Semantic annotation of frequent patterns

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining approximate top-k subspace anomalies in multi-dimensional time-series data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Effective and efficient itemset pattern summarization: regression-based approaches

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Colibri: fast mining of large static and dynamic graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing Datasets Using Frequent Itemsets: Dependency on the Mining Parameters

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Capturing association among items in a database

Data & Knowledge Engineering
On effective presentation of graph patterns: a structural representative approach

Proceedings of the 17th ACM conference on Information and knowledge management
Fast mining of complex time-stamped events

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

Data & Knowledge Engineering
Identifying Users Stereotypes with Semantic Web Mining

ER '08 Proceedings of the ER 2008 Workshops (CMLSA, ECDM, FP-UML, M2AS, RIGiM, SeCoGIS, WISM) on Advances in Conceptual Modeling: Challenges and Opportunities
Mining Long, Sharable Patterns in Trajectories of Moving Objects

Geoinformatica
Effective temporal data classification by integrating sequential pattern mining and probabilistic induction

Expert Systems with Applications: An International Journal
Cartesian contour: a concise representation for a collection of frequent sets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
TANGENT: a novel, 'Surprise me', recommendation algorithm

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-summary: a concise representation for browsing frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient rigorous approach for identifying statistically significant frequent itemsets

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Mining problem-solving strategies from HCI data

ACM Transactions on Computer-Human Interaction (TOCHI)
IMCS: incremental mining of closed sequential patterns

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Approximately mining recently representative patterns on data streams

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Frequent subtrees minging based on projected node

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Metric forensics: a multi-level approach for mining volatile graphs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Constructing classification features using minimal predictive patterns

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Approximate weighted frequent pattern mining with/without noisy environments

Knowledge-Based Systems
ESTATE: strategy for exploring labeled spatial datasets using association analysis

DS'10 Proceedings of the 13th international conference on Discovery science
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
Diversified ranking on large graphs: an optimization viewpoint

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
GBASE: a scalable and general graph management system

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient mining of top correlated patterns based on null-invariant measures

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Fast graph query processing with a low-cost index

The VLDB Journal — The International Journal on Very Large Data Bases
Summarizing frequent patterns using profiles

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Mining compressed sequential patterns

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Transaction databases, frequent itemsets, and their condensed representations

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

Journal of the ACM (JACM)
Non-negative residual matrix factorization: problem definition, fast solutions, and applications

Statistical Analysis and Data Mining
Gateway finder in large graphs: problem definitions and fast solutions

Information Retrieval
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for summarizing and analyzing twitter feeds

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A pattern discovery model for effective text mining

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
gbase: an efficient analysis platform for large graphs

The VLDB Journal — The International Journal on Very Large Data Bases
Summarizing probabilistic frequent patterns: a fast approach

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A performance study of three disk-based structures for indexing and querying frequent itemsets

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min_sup threshold may discover only commonsense patterns but a low one may generate an explosive number of output patterns, which severely restricts its usage.In this paper, we study the problem of compressing frequent-pattern sets. Typically, frequent patterns can be clustered with a tightness measure δ (called δ-cluster), and a representative pattern can be selected for each cluster. Unfortunately, finding a minimum set of representative patterns is NP-Hard. We develop two greedy methods, RPglobal and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[11], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.