A model of knowledge based information retrieval with hierarchical concept
Journal of Documentation
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Improving data quality: consistency and accuracy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Semandaq: a data quality system based on conditional functional dependencies
Proceedings of the VLDB Endowment
WordNet::Similarity: measuring the relatedness of concepts
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Interchangeable consistency constraints for public health care systems
Proceedings of the 2010 ACM Symposium on Applied Computing
Using ontologies for XML data cleaning
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Hi-index | 0.00 |
Data cleaning focuses on the identification and removal of consistency constraint violations. Existing approaches only perform statistical repair operations, i.e. inserting average or default values. This results in consistent data, but these data have no similarity with the given inconsistent data anymore. The use of an ontology-based approach allows for the detection of semantically related context-aware correction suggestions. We define metrics that can be used to calculate the similarity of such correction suggestions. We introduce measures to identify semantic distances of concepts in ontologies. This ontology enables the detection of context-aware correction suggestions and the calculation of their similarity to the invalid tuple. These suggestions can be presented to end users in data cleaning environments. We introduce this approach in a cancer registry that collects data about cancer cases. We show how the proposed approach can support domain experts in the registry in data cleaning.