Effective and efficient itemset pattern summarization: regression-based approaches

Authors:
Ruoming Jin;Muad Abu-Ata;Yang Xiang;Ning Ruan
Affiliations:
Kent State University, Kent, OH, USA;Kent State University, Kent, OH, USA;Kent State University, Kent, OH, USA;Kent State University, Kent, OH, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 23
Cited 12

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Automatic Scheduler for Real-Time Vision Applications

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Top.K Frequent Closed Patterns without Minimum Support

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
XRules: an effective structural classifier for XML data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
Mining protein family specific residue packing patterns from protein structure graphs

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Approximating a collection of frequent sets

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns: a profile-based approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Extracting redundancy-aware top-k patterns

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarizing itemset patterns using probabilistic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Non-derivable itemset mining

Data Mining and Knowledge Discovery
Computing frequent itemsets inside oracle 10G

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory

Cartesian contour: a concise representation for a collection of frequent sets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-summary: a concise representation for browsing frequent itemsets

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Output space sampling for graph patterns

Proceedings of the VLDB Endowment
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
ESTATE: strategy for exploring labeled spatial datasets using association analysis

DS'10 Proceedings of the 13th international conference on Discovery science
Cube based summaries of large association rule sets

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Summarizing frequent itemsets via pignistic transformation

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploring Disease Association from the NHANES Data: Data Mining, Pattern Summarization, and Visual Analytics

International Journal of Data Warehousing and Mining
Summarizing probabilistic frequent patterns: a fast approach

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent subgraph summarization with error control

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a set of novel regression-based approaches to effectively and efficiently summarize frequent itemset patterns. Specifically, we show that the problem of minimizing the restoration error for a set of itemsets based on a probabilistic model corresponds to a non-linear regression problem. We show that under certain conditions, we can transform the nonlinear regression problem to a linear regression problem. We propose two new methods, k-regression and tree-regression, to partition the entire collection of frequent itemsets in order to minimize the restoration error. The K-regression approach, employing a K-means type clustering method, guarantees that the total restoration error achieves a local minimum. The tree-regression approach employs a decision-tree type of top-down partition process. In addition, we discuss alternatives to estimate the frequency for the collection of itemsets being covered by the k representative itemsets. The experimental evaluation on both real and synthetic datasets demonstrates that our approaches significantly improve the summarization performance in terms of both accuracy (restoration error), and computational cost.