Meta similarity

  • Authors:
  • Byung-Won On;Ingyu Lee

  • Affiliations:
  • Singapore Management University, Singapore, Singapore;Troy University, Troy, USA

  • Venue:
  • Applied Intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

To see if two given strings are matched, various string similarity metrics have been employed and these string similarities can be categorized into three classes: (a) Edit-distance-based similarities, (b) Token-based similarities, and (c) Hybrid similarities. In essence, since different types of string similarities have different pros and cons in measuring the similarity between two strings, string similarity metrics in each class are likely to work well for particular data sets. Toward this problem, we propose a novel Meta Similarity that both (i) outperforms the existing similarity metrics and (ii) is the least affected by a variety of data sets. Our claim is empirically validated through extensive experimental tests--our proposal shows an improvement to the largest 20% average recall, compared to the best case of the existing similarity metrics and our method is the most stable, showing from 0.95 to 1.0 average recall range in all the data sets.