Redescription mining: structure theory and algorithms

  • Authors:
  • Laxmi Parida;Naren Ramakrishnan

  • Affiliations:
  • IBM Thomas J. Watson Research Center, Yorktown Heights, NY;Department of Computer Science, Virginia Tech, VA

  • Venue:
  • AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a new data mining problem--redescription mining--that unifies considerations of conceptual clustering, constructive induction, and logical formula discovery. Redescription mining begins with a collection of sets, views it as a propositional vocabulary, and identifies clusters of data that can be defined in at least two ways using this vocabulary. The primary contributions of this paper are conceptual and theoretical: (i) we formally study the space of redescriptions underlying a dataset and characterize their intrinsic structure, (ii) we identify impossibility as well as strong possibility results about when mining redescriptions is feasible, (iii) we present several scenarios of how we can custom-build redescription mining solutions for various biases, and (iv) we outline how many problems studied in the larger machine learning community are really special cases of redescription mining. By highlighting its broad scope and relevance. we aim to establish the importance of redescription mining and make the case for a thrust in this new line of research.