An optimal algorithm for similarity based entity association

Authors:
Olivier Brissac
Affiliations:
IREMIA, Faculté des Sciences, La REUNION. France
Venue:
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
Year:
1995

Citing 5
Cited 0

Conceptual structures: information processing in mind and machine

Conceptual structures: information processing in mind and machine
OGUST: a system that learns using domain properties expressed as the theorems

Machine learning
Classification in Noisy Environments Using a Distance Measure Between Structural Symbolic Descriptions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
A genuinely polynomial primal simplex algorithm for the assignment problem

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Some inductive learning systems enable structural descriptions of examples, based on elementary entities whose value can be either symbolic or numerical. This paper studies such systems, especially the ones that use a similarity measure defined between elementary entities. As it is the case in our system [12], [13], in a learning process, the matching step leads to a generalization of the training set. The main benefit of a similarity based matching step is to enable symbolic as well as numeric values processing. Given a pair of examples, learning algorithms using a similarity measure begin with computing this measure for all entities pairs taken in both examples. When it comes to choosing which entities are to be associated, a greedy method is used: entities are associated by pairs in decreasing similarity order. This paper proposes an alternative to this greedy choice based on a network flow algorithm providing an optimal result, according to the given similarity function. Furthermore, we study a generalization of this approach and we show the general case to be NP-complete. After a discussion on theoretical and practical use of the optimal method, we give some directions for further works.