Discovering all most specific sentences

Authors:
Dimitrios Gunopulos;Roni Khardon;Heikki Mannila;Sanjeev Saluja;Hannu Toivonen;Ram Sewak Sharma
Affiliations:
Computer Science and Engineering Department, University of California, Riverside, CA;EECS Department, Tufts University, Medford, MA;Department of Computer Science, University of Helsinki, Helsinki, Finland;LSI Logic, Milpitas, CA;Department of Computer Science, University of Helsinki, Helsinki, Finland;Computer Science and Engineering Department, University of California, Riverside, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2003

Citing 31
Cited 61

Design by exmple: An application of Armstrong relations

Journal of Computer and System Sciences
Principles of database and knowledge-base systems, Vol. I

Principles of database and knowledge-base systems, Vol. I
The design of relational databases

The design of relational databases
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Algorithms for inferring functional dependencies from relations

Data & Knowledge Engineering
Elements of machine learning

Elements of machine learning
Identifying the Minimal Transversals of a Hypergraph and Related Problems

SIAM Journal on Computing
Complexity of identification and dualization of positive Boolean functions

Information and Computation
Oracles and queries that are sufficient for exact learning

Journal of Computer and System Sciences
On the complexity of dualization of monotone disjunctive normal forms

Journal of Algorithms
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Fast discovery of association rules

Advances in knowledge discovery and data mining
Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Generating all maximal independent sets of bounded-degree hypergraphs

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Efficiently mining long patterns from databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Discovering All Most Specific Sentences by Randomized Algorithms

ICDT '97 Proceedings of the 6th International Conference on Database Theory
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Data Mining: Machine Learning, Statistics, and Databases

SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
The monotone theory for the PAC-model

Information and Computation
Translating between Horn representations and their characteristic models

Journal of Artificial Intelligence Research

The complexity of mining maximal frequent itemsets and maximal frequent patterns

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A "Go With the Winners" approach to finding frequent patterns

Proceedings of the 2005 ACM symposium on Applied computing
GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

Data Mining and Knowledge Discovery
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discovering Frequent Closed Partial Orders from Strings

IEEE Transactions on Knowledge and Data Engineering
Computational aspects of mining maximal frequent patterns

Theoretical Computer Science
Horn axiomatizations for sequential data

Theoretical Computer Science
Optimizing hypergraph transversal computation with an anti-monotone constraint

Proceedings of the 2007 ACM symposium on Applied computing
Static specification inference using predicate mining

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
Frequent Hypergraph Mining

Inductive Logic Programming
Providing Flexible Queries over Web Databases

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
A Knowledge-Based Approach for Answering Fuzzy Queries over Relational Databases

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Minimum-Size Bases of Association Rules

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Recognizing unexpected recurrence behaviors with fuzzy measures in sequence databases

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Estimating the number of frequent itemsets in a large database

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A view selection algorithm with performance guarantee

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Lower bounds for three algorithms for transversal hypergraph generation

Discrete Applied Mathematics
Towards a Scalable Query Rewriting Algorithm in Presence of Value Constraints

Journal on Data Semantics XII
Learning multi-linear representations of distributions for efficient inference

Machine Learning
Masking patterns in sequences: A new class of motif discovery with don't cares

Theoretical Computer Science
Efficient discovery of join plans in schemaless data

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Approximating the number of frequent sets in dense data

Knowledge and Information Systems
On the Complexity of Constraint-Based Theory Extraction

DS '09 Proceedings of the 12th International Conference on Discovery Science
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
On approximating minimum infrequent and maximum frequent sets

DS'07 Proceedings of the 10th international conference on Discovery science
iZi: a new toolkit for pattern mining problems

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
On the complexity of computing generators of closed sets

ICFCA'08 Proceedings of the 6th international conference on Formal concept analysis
Some fixed-parameter tractable classes of hypergraph duality and related problems

IWPEC'08 Proceedings of the 3rd international conference on Parameterized and exact computation
On active learning of record matching packages

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On indexing error-tolerant set containment

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Hierarchical document clustering using local patterns

Data Mining and Knowledge Discovery
Parallel computation of the minimal elements of a poset

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
The iZi project: easy prototyping of interesting pattern mining algorithms

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
On probabilistic models for uncertain sequential pattern mining

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Mining sequential patterns from probabilistic databases

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
ReDRIVE: result-driven database exploration through recommendations

Proceedings of the 20th ACM international conference on Information and knowledge management
A parallel algorithm for computing borders

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining train delays

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
An automata approach to pattern collections

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Implicit enumeration of patterns

KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases
Mining top-k frequent closed itemsets is not in APX

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Enumerating minimal explanations by minimal hitting set computation

KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
Enumerating minimally revised specifications using dualization

JSAI'05 Proceedings of the 2005 international conference on New Frontiers in Artificial Intelligence
Adaptive strategies for mining the positive border of interesting patterns: application to inclusion dependencies in databases

Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases
Discovery of minimal unsatisfiable subsets of constraints using hitting set dualization

PADL'05 Proceedings of the 7th international conference on Practical Aspects of Declarative Languages
The parameterized complexity of enumerating frequent itemsets

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
Private itemset support counting

ICICS'05 Proceedings of the 7th international conference on Information and Communications Security
Inductive logic programming: yet another application of logic

INAP'05 Proceedings of the 16th international conference on Applications of Declarative Programming and Knowledge Management
On the existence of armstrong data trees for XML functional dependencies

FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
Transaction databases, frequent itemsets, and their condensed representations

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
A Data Mining Formalization to Improve Hypergraph Minimal Transversal Computation

Fundamenta Informaticae
Deciding monotone duality and identifying frequent itemsets in quadratic logspace

Proceedings of the 32nd symposium on Principles of database systems
The complexity of mining maximal frequent subgraphs

Proceedings of the 32nd symposium on Principles of database systems
Mining-based compression approach of propositional formulae

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Efficient parsing-based search over structured data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An efficient construction and application usefulness of rectangle greedy covers

Pattern Recognition
YmalDB: exploring relational databases via result-driven recommendations

The VLDB Journal — The International Journal on Very Large Data Bases
Mining closed patterns in relational, graph and network data

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining can be viewed, in many instances, as the task of computing a representation of a theory of a model or a database, in particular by finding a set of maximally specific sentences satisfying some property. We prove some hardness results that rule out simple approaches to solving the problem.The a priori algorithm is an algorithm that has been successfully applied to many instances of the problem. We analyze this algorithm, and prove that is optimal when the maximally specific sentences are "small". We also point out its limitations.We then present a new algorithm, the Dualize and Advance algorithm, and prove worst-case complexity bounds that are favorable in the general case. Our results use the concept of hypergraph transversals. Our analysis shows that the a priori algorithm can solve the problem of enumerating the transversals of a hypergraph, improving on previously known results in a special case. On the other hand, using results for the general case of the hypergraph transversal enumeration problem, we can show that the Dualize and Advance algorithm has worst-case running time that is sub-exponential to the output size (i.e., the number of maximally specific sentences).We further show that the problem of finding maximally specific sentences is closely related to the problem of exact learning with membership queries studied in computational learning theory.