Distance based generalisation

  • Authors:
  • V. Estruch;C. Ferri;J. Hernández-Orallo;M. J. Ramírez-Quintana

  • Affiliations:
  • DSIC, Univ. Politècnica de València, València, Spain;DSIC, Univ. Politècnica de València, València, Spain;DSIC, Univ. Politècnica de València, València, Spain;DSIC, Univ. Politècnica de València, València, Spain

  • Venue:
  • ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many distance-based methods in machine learning are able to identify similar cases or prototypes from which decisions can be made. The explanation given is usually based on expressions such as “because case a is similar to case b”. However, a more general or meaningful pattern, such as “because case a has properties x and y (as b has)” is usually more difficult to find. Even in this case, the connection of this pattern with the original distance-based method is generally unclear, or even inconsistent. In this paper, we study the connection between the concept of distance (or similarity) and the concept of generalisation. More precisely, we define several conditions which, in our view, a sensible distance-based generalisation must have. From that, we are able to tell whether a generalisation operator for a pattern representation language is consistent with the metric space defined by the underlying distance. We show that there are pattern languages and generalisation operators which comply with these properties for typical data types: nominal, numerical, sets and lists. We also show the relationship between the well-known concepts of lgg and distances between terms, and the definition of generalisation presented in this paper.