Zigzag: a new algorithm for mining large inclusion dependencies in databases

Authors:
Fabien De Marchi;Jean-Marc Petit
Affiliations:
-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 25
Cited 9

The implication problem for functional and inclusion dependencies

Information and Control
Identifying the Minimal Transversals of a Hypergraph and Related Problems

SIAM Journal on Computing
Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient mining of association rules using closed itemset lattices

Information Systems
The Clio project: managing heterogeneity

ACM SIGMOD Record
Mining frequent patterns with counting inference

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Towards long pattern generation in dense databases

ACM SIGKDD Explorations Newsletter
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
A Guided Tour of Relational Databases and Beyond

A Guided Tour of Relational Databases and Beyond
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Discovering interesting inclusion dependencies: application to logical database tuning

Information Systems - Databases: Creation, management and utilization
Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Data Mining and Knowledge Discovery
Justification for Inclusion Dependency Normal Form

IEEE Transactions on Knowledge and Data Engineering
Analysis of existing databases at the logical level: the DBA companion project

ACM SIGMOD Record
Discovery of Constraints and Data Dependencies in Databases (Extended Abstract)

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Efficient Algorithms for Mining Inclusion Dependencies

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Query Folding with Inclusion Dependencies

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Inclusion Dependencies in Database Design

Proceedings of the Second International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Enforcing Inclusion Dependencies and Referencial Integrity

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Approximate matching of textual domain attributes for information source integration

Proceedings of the 2nd international workshop on Information quality in information systems
A Thorough Experimental Study of Datasets for Frequent Itemsets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Semantic sampling of existing databases through informative Armstrong databases

Information Systems
Unary and n-ary inclusion dependency discovery in relational databases

Journal of Intelligent Information Systems
Towards a Scalable Query Rewriting Algorithm in Presence of Value Constraints

Journal on Data Semantics XII
A new classification of datasets for frequent itemsets

Journal of Intelligent Information Systems
On multi-column foreign key discovery

Proceedings of the VLDB Endowment
Heuristic strategies for the discovery of inclusion dependencies and other patterns

Journal on Data Semantics V
Improving the Data Quality of Drug Databases using Conditional Dependencies and Ontologies

Journal of Data and Information Quality (JDIQ)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the relational model, inclusion dependencies (INDs)convey many information on data semantics. They generalizeforeign keys, which are very popular constraints inpractice. However, one seldom knows the set of satisfiedINDs in a database. The IND discovery problem in existingdatabases can be formulated as a data-mining problem.We underline in this article that the exploration of IND expressionsfrom most general (smallest) INDs to most specific(largest) INDs does not succeed whenever large INDshave to be discovered. To cope with this problem, we introducea new algorithm, called Zigzag , which combinesthe strength of levelwise algorithms (to find out some smallestINDs) with an optimistic criteria to jump more or lessto largest INDs. Preliminary tests, on synthetic databases,are presented and commented on. It is worth noting that themain result of this paper is general enough to be appliedto other data-mining problems, such as maximal frequentitemsets mining.