A new polynomial-time algorithm for linear programming
Combinatorica
Identifying the Minimal Transversals of a Hypergraph and Related Problems
SIAM Journal on Computing
On the complexity of dualization of monotone disjunctive normal forms
Journal of Algorithms
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Data mining, hypergraph transversals, and machine learning (extended abstract)
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Privacy-preserving data mining
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On the design and quantification of privacy preserving data mining algorithms
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A condensed representation to find frequent patterns
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns with counting inference
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Real world performance of association rule algorithms
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Linear Optimization
Introduction to Linear Optimization
Levelwise Search and Borders of Theories in KnowledgeDiscovery
Data Mining and Knowledge Discovery
Discovering Frequent Closed Itemsets for Association Rules
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Concise Representation of Frequent Patterns Based on Disjunction-Free Generators
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Frequent Itemsets Using Support Constraints
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Concise Representation of Frequent Patterns Based on Generalized Disjunction-Free Generators
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Revealing information while preserving privacy
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Limiting privacy breaches in privacy preserving data mining
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Feasible itemset distributions in data mining: theory and application
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Privacy preserving mining of association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On Maximal Frequent and Minimal Infrequent Sets in Binary Matrices
Annals of Mathematics and Artificial Intelligence
Protecting Sensitive Knowledge By Data Sanitization
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Data Privacy through Optimal k-Anonymization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Computational complexity of itemset frequency satisfiability
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Distribution-Based Synthetic Database Generation Techniques for Itemset Mining
IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series)
Parameterized Complexity Theory (Texts in Theoretical Computer Science. An EATCS Series)
A reconstruction-based algorithm for classification rules hiding
ADC '06 Proceedings of the 17th Australasian Database Conference - Volume 49
Data Mining and Knowledge Discovery
Frequent pattern mining: current status and future directions
Data Mining and Knowledge Discovery
Maintaining data privacy in association rule mining
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The applicability of the perturbation based privacy preserving data mining for real-world data
Data & Knowledge Engineering
Itemset frequency satisfiability: Complexity and axiomatization
Theoretical Computer Science
A new concise representation of frequent itemsets using generators and a positive border
Knowledge and Information Systems
Performance evaluation of evolutionary algorithms in classification of biomedical datasets
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Data Warehouse Design: Modern Principles and Methodologies
Data Warehouse Design: Modern Principles and Methodologies
An Effective Approach to Inverse Frequent Set Mining
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A further study on inverse frequent set mining
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
A survey on condensed representations for frequent sets
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
On parameterized approximability
IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
A FP-tree-based method for inverse frequent set mining
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Deciding monotone duality and identifying frequent itemsets in quadratic logspace
Proceedings of the 32nd symposium on Principles of database systems
Parameterized Complexity
Hi-index | 0.00 |
Inverse frequent set mining (IFM) is the problem of computing a transaction database D satisfying given support constraints for some itemsets, which are typically the frequent ones. This article proposes a new formulation of IFM, called IFMI (IFM with infrequency constraints), where the itemsets that are not listed as frequent are constrained to be infrequent; that is, they must have a support less than or equal to a specified unique threshold. An instance of IFMI can be seen as an instance of the original IFM by making explicit the infrequency constraints for the minimal infrequent itemsets, corresponding to the so-called negative generator border defined in the literature. The complexity increase from PSPACE (complexity of IFM) to NEXP (complexity of IFMI) is caused by the cardinality of the negative generator border, which can be exponential in the original input size. Therefore, the article introduces a specific problem parameter κ that computes an upper bound to this cardinality using a hypergraph interpretation for which minimal infrequent itemsets correspond to minimal transversals. By fixing a constant k, the article formulates a k-bounded definition of the problem, called k-IFMI, that collects all instances for which the value of the parameter κ is less than or equal to k—its complexity is in PSPACE as for IFM. The bounded problem is encoded as an integer linear program with a large number of variables (actually exponential w.r.t. the number of constraints), which is thereafter approximated by relaxing integer constraints—the decision problem of solving the linear program is proven to be in NP. In order to solve the linear program, a column generation technique is used that is a variation of the simplex method designed to solve large-scale linear programs, in particular with a huge number of variables. The method at each step requires the solution of an auxiliary integer linear program, which is proven to be NP hard in this case and for which a greedy heuristic is presented. The resulting overall column generation solution algorithm enjoys very good scaling as evidenced by the intensive experimentation, thereby paving the way for its application in real-life scenarios.