Improving the accuracy of entity identification through refinement

Authors:
Yue Kou
Affiliations:
Northeastern University, Shenyang, China
Venue:
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
Year:
2008

Citing 10
Cited 1

Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative record linkage for cleaning and integration

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining approximate functional dependencies and concept similarities to answer imprecise queries

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Robust Identification of Fuzzy Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DogmatiX tracks down duplicates in XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation

Proceedings of the 2nd international workshop on Information quality in information systems
Ordering the attributes of query results

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms

Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Potential role based entity matching for dataspaces search

WISE'10 Proceedings of the 11th international conference on Web information systems engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid growth of Web Databases, it is necessary to integrate large-scale data available on Web automatically. However, the overlap information from different data sources will impair the quality of data integration. Thus, the goal of entity identification is to correctly identify all the instances of the same entity so as to eliminate the inconsistency of data sources during data integration. In this paper, we present a Three-phase Gradual Refining based Entity Identification Mechanism called TGR-EIM. Unlike traditional approaches, not only attribute features of instances but also semantic context and statistical constraints are analyzed to improve the accuracy of entity identification. Moreover, a self-Adaptive Knowledge Maintenance method (AKM) is proposed to maintain the completeness and validity of the instance relationship knowledge generated by TGR-EIM. Various experiments have demonstrated the feasibility and effectiveness of key techniques of TGR-EIM.