Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining approximate functional dependencies and concept similarities to answer imprecise queries
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Ordering the attributes of query results
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Potential role based entity matching for dataspaces search
WISE'10 Proceedings of the 11th international conference on Web information systems engineering
Hi-index | 0.00 |
With the rapid growth of Web Databases, it is necessary to integrate large-scale data available on Web automatically. However, the overlap information from different data sources will impair the quality of data integration. Thus, the goal of entity identification is to correctly identify all the instances of the same entity so as to eliminate the inconsistency of data sources during data integration. In this paper, we present a Three-phase Gradual Refining based Entity Identification Mechanism called TGR-EIM. Unlike traditional approaches, not only attribute features of instances but also semantic context and statistical constraints are analyzed to improve the accuracy of entity identification. Moreover, a self-Adaptive Knowledge Maintenance method (AKM) is proposed to maintain the completeness and validity of the instance relationship knowledge generated by TGR-EIM. Various experiments have demonstrated the feasibility and effectiveness of key techniques of TGR-EIM.