Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Exploiting relationships for object consolidation
Proceedings of the 2nd international workshop on Information quality in information systems
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Entity matching in heterogeneous databases: A logistic regression approach
Decision Support Systems
Cluster validity measurement techniques
AIKED'06 Proceedings of the 5th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases
Ricochet: A Family of Unconstrained Algorithms for Graph Clustering
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
Framework for evaluating clustering algorithms in duplicate detection
Proceedings of the VLDB Endowment
Detecting nearly duplicated records in location datasets
Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In real-world, entities change dynamically and the changes are capture in two dimensions: time and space. For data sets that contain temporal records, where each record is associated with a time stamp and describes some aspects of a real-world entity at that particular time, we often wish to identify records that describe the same entity over time and so be able to enable interesting longitudinal data analysis. For data sets that contain geographically referenced data describing real-world entities at different locations (i.e., location entities), we wish to link those entities that belong to the same organization or network. However, existing record linkage techniques ignore additional evidence in temporal and spatial data and can fall short for these cases. This proposal studies linking temporal and spatial records. For temporal record linkage, we apply time decay to capture the effect of elapsed time on entity value evolution, and propose clustering methods that consider time order of records in clustering. For linking location records, we distinguish between strong and weak evidence; for the former, we study core generation in presence of erroneous data, and then leverage the discovered strong evidence to make remaining decisions.