Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Scaling up all pairs similarity search
Proceedings of the 16th international conference on World Wide Web
Benchmarking declarative approximate selection predicates
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Efficient Merging and Filtering Algorithms for Approximate String Searches
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Hi-index | 0.00 |
In many different application areas entity resolution places a pivotal role. Because of the existence of uncertain in many applications such as information extraction and online product category, entity resolution should be applied on uncertain data. The characteristic of uncertainty makes it impossible to apply traditional techniques directly. In this paper, we propose techniques to perform entity resolution on uncertain data. Firstly, we propose a new probabilistic similarity metric for uncertain tuples. Secondly, based on the metric, we propose novel pruning techniques to efficiently join pairwise uncertain tuples without enumerating all possible worlds. Finally, we propose a density-based clustering algorithm to combine the results of pairwise similarity join. With extensive experimental evaluation on synthetic and real-world data sets, we demonstrate the benefits and features of our approaches.