Automatic text processing
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
A technique for computer detection and correction of spelling errors
Communications of the ACM
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Modern Information Retrieval
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Extensions of fuzzy measures and Sugeno integral for possibilistic truth values
International Journal of Intelligent Systems
A possibilistic approach to string comparison
IEEE Transactions on Fuzzy Systems
Concept-relational text clustering
International Journal of Intelligent Systems
Hi-index | 0.00 |
The problem of detecting coreferent objects of arbitrary complexity is a challenging topic in current research. A possibilistic solution for this problem is to treat it as an uncertain Boolean problem. This means that two objects are either coreferent or not (i.e., a Boolean matter), but uncertainty about this decision must be dealt with. An operator that determines the uncertainty about the coreference of two objects is called an evaluator. When we deal with structured objects, decomposition into attributes (i.e., atomic subobjects) allows the definition of evaluators on well-known subdomains. This paper proceeds previous research on evaluators for strings, which is a widely used data type for attributes. First of all, the Sugeno integral based on the framework of conditional necessity is shown to be related to the existing technique.More specifically, a special case of this Sugeno integral is equivalent to regular conjunction of transformed possibilistic truth values, which is used by existing evaluators for strings. As a consequence, a subfamily of the existing evaluator is obtained for strings. This subfamily is shown to satisfy several interesting properties, which are used to construct an efficient optimization algorithm for string evaluators. Next, the use of a frequency filter is investigated. Finally, novel and advanced techniques like interlevel-information exchange and the use of multiple quantifiers are defined and investigated. Aseries of tests on diverse datasets shows the high accuracy and robustness of the approach that is introduced in this paper.