On ordered weighted averaging aggregation operators in multicriteria decisionmaking
IEEE Transactions on Systems, Man and Cybernetics
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
A technique for computer detection and correction of spelling errors
Communications of the ACM
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Properties of possibilistic string comparison
IEEE Transactions on Fuzzy Systems
Concept-relational text clustering
International Journal of Intelligent Systems
Hi-index | 0.00 |
In this paper, comparison of strings is tackled from a possibilistic point of view. Instead of using the concept of similarity between strings, coreference between strings is adopted. The possibility of coreference is estimated by means of a possibilistic comparison operator. In literature, two important classes of comparison methods for strings have been distinguished: character-based methods and token-based methods. The first class treats a string as a sequence of characters, while the second class treats a string as a vector of substrings. The first contribution of this paper is to propose a new character-based method that is able to detect typographical errors and abbreviations. The main advantage of the proposed technique is the very low complexity in comparison with existing character-based techniques. In a second contribution, two-level systems are investigated and a new approach is described. The novelty of the proposed two-level system is the use of multiset comparison rather than vector comparison. It is shown how an ordered weighted conjunctive operator that uses a parameterized fuzzy quantifier to deliver weights is competitive with frequency-based weights. In addition, the use of a quantifier is significantly faster than the use of existing weight techniques. In a third contribution, a novel class of hybrid techniques is proposed that combines the advantages of several methods. Finally, comparative tests regarding accuracy and execution time are performed and reported.