Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Batch Top-k Search for Dictionary-based Entity Recognition
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Effective keyword search in relational databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
An efficient filter for approximate membership checking
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
Entity matching plays a crucial role in information integration among heterogeneous data sources, and numerous solutions have been developed. Entity resolution based on reference table has the benefits of high efficiency and being easy to update. In such kind of methods, the reference table is important for effective entity matching. In this paper, we focus on the construction of effective reference table by relying on co-occurring relationship between tokens to identify suitable entity names. To achieve high efficiency and accuracy, we first model data set as graph, and then cluster the vertices in the graph in two stages. Based on the connectivity between vertices, we also mine synonyms and get the expansive reference table. We develop an iterative system and conduct an experimental study using real data. Experimental results show that the method in this paper achieves both high accuracy and efficiency.