Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Multi-pass sorted neighborhood blocking with MapReduce
Computer Science - Research and Development
Load Balancing for MapReduce-based Entity Resolution
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
10th international workshop on quality in databases: QDB 2012
ACM SIGMOD Record
Don't match twice: redundancy-free similarity computation with MapReduce
Proceedings of the Second Workshop on Data Analytics in the Cloud
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We demonstrate a powerful and easy-to-use tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match classifiers. Specified workflows are automatically translated into MapReduce jobs for parallel execution on different Hadoop clusters. To achieve high performance Dedoop supports several advanced load balancing strategies.