Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Authors:
Jean-François Boulicaut;Artur Bykowski;Christophe Rigotti
Affiliations:
Laboratoire d'Ingénierie des Systèmes d'Information, INSA Lyon, Bâtiment 501, F-69621 Villeurbanne Cedex, France. Jean-Francois.Boulicaut@insa-lyon.fr;Laboratoire d'Ingénierie des Systèmes d'Information, INSA Lyon, Bâtiment 501, F-69621 Villeurbanne Cedex, France. Artur.Bykowski@insa-lyon.fr;Laboratoire d'Ingénierie des Systèmes d'Information, INSA Lyon, Bâtiment 501, F-69621 Villeurbanne Cedex, France. Christophe.Rigotti@insa-lyon.fr
Venue:
Data Mining and Knowledge Discovery
Year:
2003

Citing 9
Cited 90

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast discovery of association rules

Advances in knowledge discovery and data mining
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient mining of association rules using closed itemset lattices

Information Systems
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Approximation of Frequency Queris by Means of Free-Sets

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Dynamic Miss-Counting Algorithms: Finding Implication and Similarity Rules with Confidence Pruning

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Relational Association Rules: Getting WARMeR

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
A perspective on inductive databases

ACM SIGKDD Explorations Newsletter
Using transposition for pattern discovery from microarray data

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Zigzag: a new algorithm for mining large inclusion dependencies in databases

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Change Profiles

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
DBC: a condensed representation of frequent patterns for efficient mining

Information Systems
Optimizing subset queries: a step towards SQL-based inductive databases for itemsets

Proceedings of the 2004 ACM symposium on Applied computing
On support thresholds in associative classification

Proceedings of the 2004 ACM symposium on Applied computing
Support envelopes: a technique for exploring the structure of association patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Essential classification rule sets

ACM Transactions on Database Systems (TODS)
TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

IEEE Transactions on Knowledge and Data Engineering
Incremental interactive mining of constrained association rules from biological annotation data with nominal features

Proceedings of the 2005 ACM symposium on Applied computing
Tight upper bounds on the number of candidate patterns

ACM Transactions on Database Systems (TODS)
A Thorough Experimental Study of Datasets for Frequent Itemsets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Association mining

ACM Computing Surveys (CSUR)
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Frequent closed itemset based algorithms: a thorough structural and analytical survey

ACM SIGKDD Explorations Newsletter
On compressing frequent patterns

Data & Knowledge Engineering
CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

Information Systems
Non-derivable itemset mining

Data Mining and Knowledge Discovery
Constraint-based concept mining and its application to microarray data analysis

Intelligent Data Analysis
Optimizing hypergraph transversal computation with an anti-monotone constraint

Proceedings of the 2007 ACM symposium on Applied computing
Mining optimal decision trees from itemset lattices

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Summarization – compressing data into an informative representation

Knowledge and Information Systems
Efficient mining of understandable patterns from multivariate interval time series

Data Mining and Knowledge Discovery
Discovering frequent itemsets by support approximation and itemset clustering

Data & Knowledge Engineering
Itemset frequency satisfiability: Complexity and axiomatization

Theoretical Computer Science
CARIBIAM: Constrained Association Rules using Interactive Biological IncrementAl Mining

International Journal of Bioinformatics Research and Applications
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
Closed Sets for Labeled Data

The Journal of Machine Learning Research
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Adequate condensed representations of patterns

Data Mining and Knowledge Discovery
Mining conjunctive sequential patterns

Data Mining and Knowledge Discovery
MINI: Mining Informative Non-redundant Itemsets

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Discovering Knowledge from Local Patterns with Global Constraints

ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
A new concise representation of frequent itemsets using generators and a positive border

Knowledge and Information Systems
Post-processing of associative classification rules using closed sets

Expert Systems with Applications: An International Journal
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Minimum description length principle: generators are preferable to closed patterns

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets

Data & Knowledge Engineering
Finding Top-N Pseudo Formal Concepts with Core Intents

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A new generic basis of "factual" and "implicative" association rules

Intelligent Data Analysis
Supporting bi-cluster interpretation in 0/1 data by means of local patterns

Intelligent Data Analysis - Selected papers from IDA2005, Madrid, Spain
Efficient Vertical Mining of Frequent Closures and Generators

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
The Model of Most Informative Patterns and Its Application to Knowledge Extraction from Graph Databases

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Mining multidimensional and multilevel sequential patterns

ACM Transactions on Knowledge Discovery from Data (TKDD)
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
A new classification of datasets for frequent itemsets

Journal of Intelligent Information Systems
Frequent pattern mining and knowledge indexing based on zero-suppressed BDDs

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Efficient mining under rich constraints derived from various datasets

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Feature construction based on closedness properties is not that simple

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Actionability and formal concepts: a data mining perspective

ICFCA'08 Proceedings of the 6th international conference on Formal concept analysis
Succinct system of minimal generators: a thorough study, limitations and new definitions

CLA'06 Proceedings of the 4th international conference on Concept lattices and their applications
Generic association rule bases: are they so succinct?

CLA'06 Proceedings of the 4th international conference on Concept lattices and their applications
Block interaction: a generative summarization scheme for frequent patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Optimal constraint-based decision tree induction from itemset lattices

Data Mining and Knowledge Discovery
Frequent regular itemset mining

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Two measures of objective novelty in association rule mining

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Finding minimal rare itemsets and rare association rules

KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Fast extraction of locally optimal patterns based on consistent pattern function variations

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Reliable representations for association rules

Data & Knowledge Engineering
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
Inductive databases and constraint-based data mining

ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
Mining formal concepts with a bounded number of exceptions from transactional data

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Database transposition for constrained (closed) pattern mining

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
An automata approach to pattern collections

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Implicit enumeration of patterns

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Condensed representation of EPs and patterns quantified by frequency-based measures

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Towards fault-tolerant formal concept analysis

AI*IA'05 Proceedings of the 9th conference on Advances in Artificial Intelligence
Feature construction and δ-free sets in 0/1 samples

DS'06 Proceedings of the 9th international conference on Discovery Science
An efficient framework for mining flexible constraints

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
From local pattern mining to relevant bi-cluster characterization

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
A survey on condensed representations for frequent sets

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Contribution to gene expression data analysis by means of set pattern mining

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Mining succinct systems of minimal generators of formal concepts

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Mining frequent δ-free patterns in large databases

DS'05 Proceedings of the 8th international conference on Discovery Science
On private scalar product computation for privacy-preserving data mining

ICISC'04 Proceedings of the 7th international conference on Information Security and Cryptology
Iterative bayesian network implementation by using annotated association rules

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Processing count queries over event streams at multiple time granularities

Information Sciences: an International Journal
Constraint-Based mining of fault-tolerant patterns from boolean data

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Transaction databases, frequent itemsets, and their condensed representations

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
A false negative maximal frequent itemset mining algorithm over stream

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Finding minimum representative pattern sets

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts

Information Sciences: an International Journal
Review: Formal Concept Analysis in knowledge processing: A survey on models and techniques

Expert Systems with Applications: An International Journal
Formal and computational properties of the confidence boost of association rules

ACM Transactions on Knowledge Discovery from Data (TKDD)
An efficient method for mining frequent itemsets with double constraints

Engineering Applications of Artificial Intelligence
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a large collection of transactions containing items, a basic common data mining problem is to extract the so-called frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called free-sets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ε-adequate representations (H. Mannila and H. Toivonen, 1996. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), pp. 189–194). We show that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent itemset discovery, and that they can be used to approximate the support of any frequent itemset. Experiments on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemset extraction. Furthermore, the experiments show that the extraction of frequent free-sets is still possible when the extraction of frequent itemsets becomes intractable, and that the supports of the frequent free-sets can be used to approximate very closely the supports of the frequent itemsets. Finally, we consider the effect of this approximation on association rules (a popular kind of patterns that can be derived from frequent itemsets) and show that the corresponding errors remain very low in practice.