Generalization-based similarity for conceptual clustering

  • Authors:
  • S. Ferilli;T. M. A. Basile;N. Di Mauro;M. Biba;F. Esposito

  • Affiliations:
  • Dipartimento di Informatica, Università di Bari, Bari, Italia;Dipartimento di Informatica, Università di Bari, Bari, Italia;Dipartimento di Informatica, Università di Bari, Bari, Italia;Dipartimento di Informatica, Università di Bari, Bari, Italia;Dipartimento di Informatica, Università di Bari, Bari, Italia

  • Venue:
  • MCD'07 Proceedings of the 3rd ECML/PKDD international conference on Mining complex data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Knowledge extraction represents an important issue that concerns the ability to identify valid, potentially useful and understandable patterns from large data collections. Such a task becomes more difficult if the domain of application cannot be represented by means of an attribute-value representation. Thus, a more powerful representation language, such as First-Order Logic, is necessary. Due to the complexity of handling First-Order Logic formulæ, where the presence of relations causes various portions of one description to be possibly mapped in different ways onto another description, few works presenting techniques for comparing descriptions are available in the literature for this kind of representations. Nevertheless, the ability to assess similarity between first-order descriptions has many applications, ranging from description selection to flexible matching, from instance-based learning to clustering. This paper tackles the case of Conceptual Clustering, where a new approach to similarity evaluation, based on both syntactic and semantic features, is exploited to support the task of grouping together similar items according to their relational description. After presenting a framework for Horn Clauses (including criteria, a function and composition techniques for similarity assessment), classical clustering algorithms are exploited to carry out the grouping task. Experimental results on realworld datasets prove the effectiveness of the proposal.