Improving the accuracy of entity identification through refinement

  • Authors:
  • Yue Kou

  • Affiliations:
  • Northeastern University, Shenyang, China

  • Venue:
  • Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the rapid growth of Web Databases, it is necessary to integrate large-scale data available on Web automatically. However, the overlap information from different data sources will impair the quality of data integration. Thus, the goal of entity identification is to correctly identify all the instances of the same entity so as to eliminate the inconsistency of data sources during data integration. In this paper, we present a Three-phase Gradual Refining based Entity Identification Mechanism called TGR-EIM. Unlike traditional approaches, not only attribute features of instances but also semantic context and statistical constraints are analyzed to improve the accuracy of entity identification. Moreover, a self-Adaptive Knowledge Maintenance method (AKM) is proposed to maintain the completeness and validity of the instance relationship knowledge generated by TGR-EIM. Various experiments have demonstrated the feasibility and effectiveness of key techniques of TGR-EIM.