Adaptive strategies for mining the positive border of interesting patterns: application to inclusion dependencies in databases

Authors:
Fabien De Marchi;Frédéric Flouvat;Jean-Marc Petit
Affiliations:
LIRIS, UMR CNRS 5205, Université Lyon 1, Villeurbanne, France;LIMOS, UMR CNRS 6158, Université Clermont-Ferrand II, Aubière, France;LIMOS, UMR CNRS 6158, Université Clermont-Ferrand II, Aubière, France
Venue:
Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Year:
2004

Citing 18
Cited 4

The implication problem for functional and inclusion dependencies

Information and Control
Approximate inference of functional dependencies from relations

ICDT '92 Selected papers of the fourth international conference on Database theory
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New results on monotone dualization and generating hypergraph transversals

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A Guided Tour of Relational Databases and Beyond

A Guided Tour of Relational Databases and Beyond
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Discovering interesting inclusion dependencies: application to logical database tuning

Information Systems - Databases: Creation, management and utilization
Analysis of existing databases at the logical level: the DBA companion project

ACM SIGMOD Record
Efficient Algorithms for Mining Inclusion Dependencies

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Fast Algorithm for Computing Hypergraph Transversals and its Application in Mining Emerging Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
DBA Companion: A Tool for Logical Database Tuning

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An efficient implementation of a quasi-polynomial algorithm for generating hypergraph transversals and its application in joint generation

Discrete Applied Mathematics - Special issue: Discrete algorithms and optimization, in honor of professor Toshihide Ibaraki at his retirement from Kyoto University

Mining multiple-level fuzzy blocks from multidimensional data

Fuzzy Sets and Systems
Constrained colocation mining: application to soil erosion characterization

Proceedings of the 2010 ACM Symposium on Applied Computing
iZi: a new toolkit for pattern mining problems

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
The iZi project: easy prototyping of interesting pattern mining algorithms

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the theoretical framework of Mannila and Toivonen [26], we are interested in the discovery of the positive border of interesting patterns, also called the most specific interesting patterns. Many approaches have been proposed among which we quote the levelwise algorithm and the Dualize and Advance algorithm. In this paper, we propose an adaptive strategy – complementary to these two algorithms – based on four steps: 1) In order to initialize the discovery, eliciting some elements of the negative border, for instance using a levelwise strategy until a certain level k. 2) From the negative border found so far, inferring the optimistic positive border by dualization, i.e. the set of patterns whose all specializations are known to be not interesting patterns. 3) Estimating the distance between the positive border to be discovered and the optimistic positive border. 4) Based on these estimates, carrying out an adaptive search either bottom-up (the jump was too optimistic) or top-down (the solution should be very close). We have instantiated this proposition to the problem of inclusion dependency (IND) discovery. IND is a generalization of the well known concept of foreign keys in databases and is very important in practice. We will first point out how the problem of IND discovery fits into the theoretical framework of [26]. Then, we will describe an instantiation of our adaptive strategy for IND discovery, called Zigzag, from which some experiments were conducted on synthetic databases. The underlying application of this work takes place in a project called DBA Companion devoted to the understanding of existing databases at the logical level using data mining techniques.