Properties of possibilistic string comparison

Authors:
Antoon Bronselaer;Guy De Tré
Affiliations:
Department of Telecommunications and Information Processing, Ghent University, Ghent, Belgium;Department of Telecommunications and Information Processing, Ghent University, Ghent, Belgium
Venue:
IEEE Transactions on Fuzzy Systems
Year:
2010

Citing 13
Cited 1

Automatic text processing

Automatic text processing
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
A technique for computer detection and correction of spelling errors

Communications of the ACM
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Modern Information Retrieval

Modern Information Retrieval
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Extensions of fuzzy measures and Sugeno integral for possibilistic truth values

International Journal of Intelligent Systems
A possibilistic approach to string comparison

IEEE Transactions on Fuzzy Systems

Concept-relational text clustering

International Journal of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of detecting coreferent objects of arbitrary complexity is a challenging topic in current research. A possibilistic solution for this problem is to treat it as an uncertain Boolean problem. This means that two objects are either coreferent or not (i.e., a Boolean matter), but uncertainty about this decision must be dealt with. An operator that determines the uncertainty about the coreference of two objects is called an evaluator. When we deal with structured objects, decomposition into attributes (i.e., atomic subobjects) allows the definition of evaluators on well-known subdomains. This paper proceeds previous research on evaluators for strings, which is a widely used data type for attributes. First of all, the Sugeno integral based on the framework of conditional necessity is shown to be related to the existing technique.More specifically, a special case of this Sugeno integral is equivalent to regular conjunction of transformed possibilistic truth values, which is used by existing evaluators for strings. As a consequence, a subfamily of the existing evaluator is obtained for strings. This subfamily is shown to satisfy several interesting properties, which are used to construct an efficient optimization algorithm for string evaluators. Next, the use of a frequency filter is investigated. Finally, novel and advanced techniques like interlevel-information exchange and the use of multiple quantifiers are defined and investigated. Aseries of tests on diverse datasets shows the high accuracy and robustness of the approach that is introduced in this paper.