EIF: a framework of effective entity identification

  • Authors:
  • Lingli Li;Hongzhi Wang;Hong Gao;Jianzhong Li

  • Affiliations:
  • Department of Computer Science and Engineering, Harbin Institute of Technology, China;Department of Computer Science and Engineering, Harbin Institute of Technology, China;Department of Computer Science and Engineering, Harbin Institute of Technology, China;Department of Computer Science and Engineering, Harbin Institute of Technology, China

  • Venue:
  • WAIM'10 Proceedings of the 11th international conference on Web-age information management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Entity identification, that is to build corresponding relationships between objects and entities in dirty data, plays an important role in data cleaning. The confusion between entities and their names often results in dirty data. That is, different entities may share the identical name and different names may correspond to the identical entity. Therefore, the major task of entity identification is to distinguish entities sharing the same name and recognize different names referring to the same entity. However, current research focuses on only one aspect and cannot solve the problem completely. To address this problem, in this paper, EIF, a framework of entity identification with the consideration of the both kinds of confusions, is proposed. With effective clustering techniques, approximate string matching algorithms and a flexible mechanism of knowledge integration, EIF can be widely used to solve many different kinds of entity identification problems. In this paper, as an application of EIF, we solved the author identification problem. The effectiveness of this framework is verified by extensive experiments.