Reasoning about sets using redescription mining

Authors:
Mohammed J. Zaki;Naren Ramakrishnan
Affiliations:
Rensselaer Polytechnic Institute, Troy, NY;Virginia Tech, Blacksburg, VA
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 7
Cited 10

A fast algorithm for building lattices

Information Processing Letters
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Turning CARTwheels: an alternating algorithm for mining redescriptions

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure

IEEE Transactions on Knowledge and Data Engineering

BLOSOM: a framework for mining arbitrary boolean expressions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Compositional mining of multirelational biological datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Capturing truthiness: mining truth tables in binary datasets

Proceedings of the 2009 ACM symposium on Applied Computing
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Tell me what i need to know: succinctly summarizing data with itemsets

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing apples and oranges: measuring differences between data mining results

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Sampling minimal frequent boolean (DNF) patterns

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Storytelling in entity networks to support intelligence analysts

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
From black and white to full color: extending redescription mining outside the Boolean world

Statistical Analysis and Data Mining
Summarizing data succinctly with the most informative itemsets

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011

Quantified Score

Hi-index	0.00

Visualization

Abstract

Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (non-redundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraint-based exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.