Generalized Mongue-Elkan Method for Approximate Text String Comparison

  • Authors:
  • Sergio Jimenez;Claudia Becerra;Alexander Gelbukh;Fabio Gonzalez

  • Affiliations:
  • Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering Department, National University of Colombia,;Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering Department, National University of Colombia,;Natural Language Laboratory Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico;Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering Department, National University of Colombia,

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization of this method based on the notion of the generalized arithmetic mean instead of the simple average used in the expression to calculate the Monge-Elkan method. The experiments carried out with 12 well-known name-matching data sets show that the proposed approach outperforms the original Monge-Elkan method when character-based measures are used to compare tokens.