Beam search induction and similarity constraints for predictive clustering trees

  • Authors:
  • Dragi Kocev;Jan Struyf;Sašo Džeroski

  • Affiliations:
  • Dept. of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia;Dept. of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Dept. of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia

  • Venue:
  • KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Much research on inductive databases (IDBs) focuses on local models, such as item sets and association rules. In this work, we investigate how IDBs can support global models, such as decision trees. Our focus is on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs topdown, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm for PCTs based on beam search. This has three advantages over the regular method: (a) it returns a set of PCTs satisfying the user constraints instead of just one PCT; (b) it better allows for pushing of user constraints into the induction algorithm; and (c) it is less susceptible to myopia. In addition, we propose similarity constraints for PCTs, which improve the diversity of the resulting PCT set.