EIF: a framework of effective entity identification

Authors:
Lingli Li;Hongzhi Wang;Hong Gao;Jianzhong Li
Affiliations:
Department of Computer Science and Engineering, Harbin Institute of Technology, China;Department of Computer Science and Engineering, Harbin Institute of Technology, China;Department of Computer Science and Engineering, Harbin Institute of Technology, China;Department of Computer Science and Engineering, Harbin Institute of Technology, China
Venue:
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Year:
2010

Citing 13
Cited 0

Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Joint deduplication of multiple record types in relational data

Proceedings of the 14th ACM international conference on Information and knowledge management
Record linkage: similarity measures and algorithms

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Example-driven design of efficient record matching queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Transformation-based Framework for Record Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Large-Scale Deduplication with Constraints Using Dedupalog

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Metric Functional Dependencies

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploiting context analysis for combining multiple entity resolution systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Entity resolution with iterative blocking

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A grammar-based entity representation framework for data cleaning

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
BLOG: probabilistic models with unknown objects

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning string transformations from examples

Proceedings of the VLDB Endowment
Object identification with attribute-mediated dependences

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entity identification, that is to build corresponding relationships between objects and entities in dirty data, plays an important role in data cleaning. The confusion between entities and their names often results in dirty data. That is, different entities may share the identical name and different names may correspond to the identical entity. Therefore, the major task of entity identification is to distinguish entities sharing the same name and recognize different names referring to the same entity. However, current research focuses on only one aspect and cannot solve the problem completely. To address this problem, in this paper, EIF, a framework of entity identification with the consideration of the both kinds of confusions, is proposed. With effective clustering techniques, approximate string matching algorithms and a flexible mechanism of knowledge integration, EIF can be widely used to solve many different kinds of entity identification problems. In this paper, as an application of EIF, we solved the author identification problem. The effectiveness of this framework is verified by extensive experiments.