A possibilistic approach to string comparison

  • Authors:
  • Antoon Bronselaer;Guy De Tré

  • Affiliations:
  • Department of Telecommunications and Information Processing, Ghent University, Ghent, Belgium;Department of Telecommunications and Information Processing, Ghent University, Ghent, Belgium

  • Venue:
  • IEEE Transactions on Fuzzy Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, comparison of strings is tackled from a possibilistic point of view. Instead of using the concept of similarity between strings, coreference between strings is adopted. The possibility of coreference is estimated by means of a possibilistic comparison operator. In literature, two important classes of comparison methods for strings have been distinguished: character-based methods and token-based methods. The first class treats a string as a sequence of characters, while the second class treats a string as a vector of substrings. The first contribution of this paper is to propose a new character-based method that is able to detect typographical errors and abbreviations. The main advantage of the proposed technique is the very low complexity in comparison with existing character-based techniques. In a second contribution, two-level systems are investigated and a new approach is described. The novelty of the proposed two-level system is the use of multiset comparison rather than vector comparison. It is shown how an ordered weighted conjunctive operator that uses a parameterized fuzzy quantifier to deliver weights is competitive with frequency-based weights. In addition, the use of a quantifier is significantly faster than the use of existing weight techniques. In a third contribution, a novel class of hybrid techniques is proposed that combines the advantages of several methods. Finally, comparative tests regarding accuracy and execution time are performed and reported.