Summarizing itemset patterns using probabilistic models

Authors:
Chao Wang;Srinivasan Parthasarathy
Affiliations:
Ohio State University, Columbus, OH;Ohio State University, Columbus, OH
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 12
Cited 27

Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Statistical methods for speech recognition

Statistical methods for speech recognition
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
New Algorithms for Fast Discovery of Association Rules

New Algorithms for Fast Discovery of Association Rules
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

Data Mining and Knowledge Discovery

From frequent itemsets to semantically meaningful visual patterns

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective and efficient itemset pattern summarization: regression-based approaches

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
On effective presentation of graph patterns: a structural representative approach

Proceedings of the 17th ACM conference on Information and knowledge management
Cartesian contour: a concise representation for a collection of frequent sets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-summary: a concise representation for browsing frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient algorithms for mining constrained frequent patterns from uncertain data

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Mining problem-solving strategies from HCI data

ACM Transactions on Computer-Human Interaction (TOCHI)
Finding composite episodes

MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Efficient algorithms for the mining of constrained frequent patterns from uncertain data

ACM SIGKDD Explorations Newsletter
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Mining periodic behaviors for moving objects

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards site-based protein functional annotations

International Journal of Data Mining and Bioinformatics
ESTATE: strategy for exploring labeled spatial datasets using association analysis

DS'10 Proceedings of the 13th international conference on Discovery science
Krimp: mining itemsets that compress

Data Mining and Knowledge Discovery
MoveMine: Mining moving object data for discovery of animal movement patterns

ACM Transactions on Intelligent Systems and Technology (TIST)
Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing apples and oranges: measuring differences between data mining results

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Summarizing frequent itemsets via pignistic transformation

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Mining periodic behaviors of object movements for animal and biological sustainability studies

Data Mining and Knowledge Discovery
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for summarizing and analyzing twitter feeds

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure clustering: a novel mining paradigm for arbitrary data types

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Summarizing categorical data by clustering attributes

Data Mining and Knowledge Discovery
Frequent subgraph summarization with error control

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel probabilistic approach to summarize frequent itemset patterns. Such techniques are useful for summarization, post-processing, and end-user interpretation, particularly for problems where the resulting set of patterns are huge. In our approach items in the dataset are modeled as random variables. We then construct a Markov Random Fields (MRF) on these variables based on frequent itemsets and their occurrence statistics. The summarization proceeds in a level-wise iterative fashion. Occurrence statistics of itemsets at the lowest level are used to construct an initial MRF. Statistics of itemsets at the next level can then be inferred from the model. We use those patterns whose occurrence can not be accurately inferred from the model to augment the model in an iterative manner, repeating the procedure until all frequent itemsets can be modeled. The resulting MRF model affords a concise and useful representation of the original collection of itemsets. Extensive empirical study on real datasets show that the new approach can effectively summarize a large number of itemsets and typically significantly outperforms extant approaches.