An optimal algorithm for similarity based entity association

  • Authors:
  • Olivier Brissac

  • Affiliations:
  • IREMIA, Faculté des Sciences, La REUNION. France

  • Venue:
  • ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Some inductive learning systems enable structural descriptions of examples, based on elementary entities whose value can be either symbolic or numerical. This paper studies such systems, especially the ones that use a similarity measure defined between elementary entities. As it is the case in our system [12], [13], in a learning process, the matching step leads to a generalization of the training set. The main benefit of a similarity based matching step is to enable symbolic as well as numeric values processing. Given a pair of examples, learning algorithms using a similarity measure begin with computing this measure for all entities pairs taken in both examples. When it comes to choosing which entities are to be associated, a greedy method is used: entities are associated by pairs in decreasing similarity order. This paper proposes an alternative to this greedy choice based on a network flow algorithm providing an optimal result, according to the given similarity function. Furthermore, we study a generalization of this approach and we show the general case to be NP-complete. After a discussion on theoretical and practical use of the optimal method, we give some directions for further works.