A knowledge-based approach for duplicate elimination in data cleaning
Information Systems - Data extraction, cleaning and reconciliation
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Managing Reference: Ensuring Referential Integrity of Ontologies for the Semantic Web
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A graph-based approach to vehicle tracking in traffic camera video streams
DMSN '07 Proceedings of the 4th workshop on Data management for sensor networks: in conjunction with 33rd International Conference on Very Large Data Bases
DWEVOLVE: a requirement based framework for data warehouse evolution
ACM SIGSOFT Software Engineering Notes
A case study for integrating public safety data using semantic technologies
Information Polity - Special issue on Public Engagement and Government Collaboration: Theories, Strategies and Case Studies
Hi-index | 0.00 |
Approximate duplicate elimination is an important data-integration task, but its complex comparisons of many records involvinguncertainty and ambiguity make it difficult. Earlier approaches required a time-consuming and tedious process of hard coding of staticrules based on a schema. A novel duplicate-elimination framework now lets users clean data flexibly and effortlessly, without any coding.Exploiting fuzzy inference inherently handles the problem's uncertainty, and unique machine learning capabilities let the framework adaptto the specific notion of similarity appropriate for each domain. The framework is extensible and accommodative, letting the user operatewith or without training data. Additionally, many of the previous methods for duplicate elimination can be implemented quickly using thisframework.