Turning CARTwheels: an alternating algorithm for mining redescriptions

Authors:
Naren Ramakrishnan;Deept Kumar;Bud Mishra;Malcolm Potts;Richard F. Helm
Affiliations:
Virginia Tech, VA;Virginia Tech, VA;New York University, NY;Virginia Tech, VA;Virginia Tech, VA
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 14
Cited 12

Pictures of relevance: a geometric analysis of similarity measures

Journal of the American Society for Information Science
Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Scientific knowledge discovery using inductive logic programming

Communications of the ACM
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Concise, intelligible, and approximate profiling of multiple classes

International Journal of Human-Computer Studies - Special issue on Machine Discovery
A vision for management of complex models

ACM SIGMOD Record
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

Data Mining and Knowledge Discovery
Mining Very Large Databases

Computer
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research

Reasoning about sets using redescription mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Connecting topics in document collections with stepping stones and pathways

Proceedings of the 14th ACM international conference on Information and knowledge management
Algorithms for storytelling

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
BLOSOM: a framework for mining arbitrary boolean expressions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Compositional mining of multirelational biological datasets

ACM Transactions on Knowledge Discovery from Data (TKDD)
Capturing truthiness: mining truth tables in binary datasets

Proceedings of the 2009 ACM symposium on Applied Computing
Query by output

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Redescription mining: structure theory and algorithms

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Comparing apples and oranges: measuring differences between data mining results

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Siren: an interactive tool for mining and visualizing geospatial redescriptions

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
From black and white to full color: extending redescription mining outside the Boolean world

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an unusual algorithm involving classification trees---CARTwheels---where two trees are grown in opposite directions so that they are joined at their leaves. This approach finds application in a new data mining task we formulate, called redescription mining. A redescription is a shift-of-vocabulary, or a different way of communicating information about a given subset of data; the goal of redescription mining is to find subsets of data that afford multiple descriptions. We highlight the importance of this problem in domains such as bioinformatics, which exhibit an underlying richness and diversity of data descriptors (e.g., genes can be studied in a variety of ways). CARTwheels exploits the duality between class partitions and path partitions in an induced classification tree to model and mine redescriptions. It helps integrate multiple forms of characterizing datasets, situates the knowledge gained from one dataset in the context of others, and harnesses high-level abstractions for uncovering cryptic and subtle features of data. Algorithm design decisions, implementation details, and experimental results are presented.