Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

Authors:
Yongge Wang;Xintao Wu
Affiliations:
University of North Carolina at Charlotte;University of North Carolina at Charlotte
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 6
Cited 6

Probabilistic Satisfiability

Journal of Complexity
A logic for reasoning about probabilities

Information and Computation - Selections from 1988 IEEE symposium on logic in computer science
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computational complexity of itemset frequency satisfiability

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Itemset frequency satisfiability: Complexity and axiomatization

Theoretical Computer Science
Anonymizing transaction databases for publication

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving publishing microdata with full functional dependencies

Data & Knowledge Engineering
A FP-tree-based method for inverse frequent set mining

BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Interactive pattern mining on hidden data: a sampling-based solution

Proceedings of the 21st ACM international conference on Information and knowledge management
Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to generate synthetic basket datasets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket datasets. The characteristics that could be used for this purpose include the frequent itemsets and association rules. The problem of generating synthetic basket datasets from frequent itemsets is generally referred to as inverse frequent itemset mining. In this paper, we show that the problem of approximate inverse frequent itemset mining is NP-complete. Then we propose and analyze an approximate algorithm for approximate inverse frequent itemset mining, and discuss privacy issues related to the synthetic basket dataset. In particular, we propose an approximate algorithm to determine the privacy leakage in a synthetic basket dataset.