Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
What Energy Functions Can Be Minimized via Graph Cuts?
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards a robust query optimizer: a principled and practical approach
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Entity Resolution with Markov Logic
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Unsupervised deduplication using cross-field dependencies
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-Scale Deduplication with Constraints Using Dedupalog
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate
Proceedings of the 20th ACM international conference on Information and knowledge management
Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data
Proceedings of the fifth ACM international conference on Web search and data mining
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
LINDA: distributed web-of-data-scale entity matching
Proceedings of the 21st ACM international conference on Information and knowledge management
Fast and accurate incremental entity resolution relative to an entity knowledge base
Proceedings of the 21st ACM international conference on Information and knowledge management
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
MFIBlocks: An effective blocking algorithm for entity resolution
Information Systems
SIGMa: simple greedy matching for aligning large knowledge bases
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting user clicks for automatic seed set generation for entity matching
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Entity disambiguation in anonymized graphs using graph kernels
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Joint entity resolution on multiple datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
There have been several recent advancements in Machine Learning community on the Entity Matching (EM) problem. However, their lack of scalability has prevented them from being applied in practical settings on large real-life datasets. Towards this end, we propose a principled framework to scale any generic EM algorithm. Our technique consists of running multiple instances of the EM algorithm on small neighborhoods of the data and passing messages across neighborhoods to construct a global solution. We prove formal properties of our framework and experimentally demonstrate the effectiveness of our approach in scaling EM algorithms.