C4.5: programs for machine learning
C4.5: programs for machine learning
A database perspective on knowledge discovery
Communications of the ACM
IEEE Transactions on Pattern Analysis and Machine Intelligence
Building Decision Trees with Constraints
Data Mining and Knowledge Discovery
Top-Down Induction of Clustering Trees
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A perspective on inductive databases
ACM SIGKDD Explorations Newsletter
Analysis of time series data with predictive clustering trees
KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Decision trees for hierarchical multilabel classification: a case study in functional genomics
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Constraint based induction of multi-objective regression trees
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Clustering Trees with Instance Level Constraints
ECML '07 Proceedings of the 18th European conference on Machine Learning
Ensembles of Multi-Objective Decision Trees
ECML '07 Proceedings of the 18th European conference on Machine Learning
Non-redundant subgroup discovery in large and complex data
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Hi-index | 0.00 |
Much research on inductive databases (IDBs) focuses on local models, such as item sets and association rules. In this work, we investigate how IDBs can support global models, such as decision trees. Our focus is on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs topdown, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm for PCTs based on beam search. This has three advantages over the regular method: (a) it returns a set of PCTs satisfying the user constraints instead of just one PCT; (b) it better allows for pushing of user constraints into the induction algorithm; and (c) it is less susceptible to myopia. In addition, we propose similarity constraints for PCTs, which improve the diversity of the resulting PCT set.