Pictures of relevance: a geometric analysis of similarity measures
Journal of the American Society for Information Science
Elements of information theory
Elements of information theory
C4.5: programs for machine learning
C4.5: programs for machine learning
CACTUS—clustering categorical data using summaries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Scientific knowledge discovery using inductive logic programming
Communications of the ACM
Generating non-redundant association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Concise, intelligible, and approximate profiling of multiple classes
International Journal of Human-Computer Studies - Special issue on Machine Discovery
A vision for management of complex models
ACM SIGMOD Record
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets
Data Mining and Knowledge Discovery
Computer
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Reasoning about sets using redescription mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Connecting topics in document collections with stepping stones and pathways
Proceedings of the 14th ACM international conference on Information and knowledge management
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
BLOSOM: a framework for mining arbitrary boolean expressions
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Compositional mining of multirelational biological datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
Capturing truthiness: mining truth tables in binary datasets
Proceedings of the 2009 ACM symposium on Applied Computing
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Redescription mining: structure theory and algorithms
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Mining correlated subgraphs in graph databases
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Comparing apples and oranges: measuring differences between data mining results
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Siren: an interactive tool for mining and visualizing geospatial redescriptions
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
From black and white to full color: extending redescription mining outside the Boolean world
Statistical Analysis and Data Mining
Hi-index | 0.00 |
We present an unusual algorithm involving classification trees---CARTwheels---where two trees are grown in opposite directions so that they are joined at their leaves. This approach finds application in a new data mining task we formulate, called redescription mining. A redescription is a shift-of-vocabulary, or a different way of communicating information about a given subset of data; the goal of redescription mining is to find subsets of data that afford multiple descriptions. We highlight the importance of this problem in domains such as bioinformatics, which exhibit an underlying richness and diversity of data descriptors (e.g., genes can be studied in a variety of ways). CARTwheels exploits the duality between class partitions and path partitions in an induced classification tree to model and mine redescriptions. It helps integrate multiple forms of characterizing datasets, situates the knowledge gained from one dataset in the context of others, and harnesses high-level abstractions for uncovering cryptic and subtle features of data. Algorithm design decisions, implementation details, and experimental results are presented.