Large-scale collective entity matching

Authors:
Vibhor Rastogi;Nilesh Dalvi;Minos Garofalakis
Affiliations:
Yahoo! Research;Yahoo! Research;Technical University of Crete
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 10
Cited 12

Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
What Energy Functions Can Be Minimized via Graph Cuts?

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards a robust query optimizer: a principled and practical approach

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Entity Resolution with Markov Logic

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Unsupervised deduplication using cross-field dependencies

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-Scale Deduplication with Constraints Using Dedupalog

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Entity resolution with iterative blocking

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate

Proceedings of the 20th ACM international conference on Information and knowledge management
Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data

Proceedings of the fifth ACM international conference on Web search and data mining
Entity resolution: theory, practice & open challenges

Proceedings of the VLDB Endowment
LINDA: distributed web-of-data-scale entity matching

Proceedings of the 21st ACM international conference on Information and knowledge management
Fast and accurate incremental entity resolution relative to an entity knowledge base

Proceedings of the 21st ACM international conference on Information and knowledge management
Data Linking for the Semantic Web

International Journal on Semantic Web & Information Systems
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
MFIBlocks: An effective blocking algorithm for entity resolution

Information Systems
SIGMa: simple greedy matching for aligning large knowledge bases

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting user clicks for automatic seed set generation for entity matching

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Entity disambiguation in anonymized graphs using graph kernels

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Joint entity resolution on multiple datasets

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

There have been several recent advancements in Machine Learning community on the Entity Matching (EM) problem. However, their lack of scalability has prevented them from being applied in practical settings on large real-life datasets. Towards this end, we propose a principled framework to scale any generic EM algorithm. Our technique consists of running multiple instances of the EM algorithm on small neighborhoods of the data and passing messages across neighborhoods to construct a global solution. We prove formal properties of our framework and experimentally demonstrate the effectiveness of our approach in scaling EM algorithms.