Entity resolution on uncertain relations

Authors:
Huabin Feng;Hongzhi Wang;Jianzhong Li;Hong Gao
Affiliations:
Harbin Institute of Technology, China;Harbin Institute of Technology, China;Harbin Institute of Technology, China;Harbin Institute of Technology, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 8
Cited 0

Efficient set joins on similarity predicates

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Primitive Operator for Similarity Joins in Data Cleaning

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Benchmarking declarative approximate selection predicates

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient similarity joins for near duplicate detection

Proceedings of the 17th international conference on World Wide Web
Efficient Merging and Filtering Algorithms for Approximate String Searches

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Top-k Set Similarity Joins

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many different application areas entity resolution places a pivotal role. Because of the existence of uncertain in many applications such as information extraction and online product category, entity resolution should be applied on uncertain data. The characteristic of uncertainty makes it impossible to apply traditional techniques directly. In this paper, we propose techniques to perform entity resolution on uncertain data. Firstly, we propose a new probabilistic similarity metric for uncertain tuples. Secondly, based on the metric, we propose novel pruning techniques to efficiently join pairwise uncertain tuples without enumerating all possible worlds. Finally, we propose a density-based clustering algorithm to combine the results of pairwise similarity join. With extensive experimental evaluation on synthetic and real-world data sets, we demonstrate the benefits and features of our approaches.