Generalized Mongue-Elkan Method for Approximate Text String Comparison

Authors:
Sergio Jimenez;Claudia Becerra;Alexander Gelbukh;Fabio Gonzalez
Affiliations:
Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering Department, National University of Colombia,;Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering Department, National University of Colombia,;Natural Language Laboratory Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico;Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering Department, National University of Colombia,
Venue:
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 11
Cited 4

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
The String-to-String Correction Problem

Journal of the ACM (JACM)
Modern Information Retrieval

Modern Information Retrieval
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Heterogeneous Field Matching Method for Record Linkage

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

International Journal on Document Analysis and Recognition
Robust similarity measures for named entities matching

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1

A generic approach for combining linguistic and context profile metrics in ontology matching

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
SC spectra: a linear-time soft cardinality approximation for text comparison

MICAI'11 Proceedings of the 10th international conference on Artificial Intelligence: advances in Soft Computing - Volume Part II
Soft cardinality: a parameterized similarity function for text comparison

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Soft cardinality + ML: learning adaptive similarity functions for cross-lingual textual entailment

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.01

Visualization

Abstract

The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization of this method based on the notion of the generalized arithmetic mean instead of the simple average used in the expression to calculate the Monge-Elkan method. The experiments carried out with 12 well-known name-matching data sets show that the proposed approach outperforms the original Monge-Elkan method when character-based measures are used to compare tokens.