A product perspective on total data quality management
Communications of the ACM
A knowledge-based approach for duplicate elimination in data cleaning
Information Systems - Data extraction, cleaning and reconciliation
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
An Extensible Framework for Data Cleaning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A normal form for XML documents
ACM Transactions on Database Systems (TODS)
Colorful XML: one hierarchy isn't enough
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Completeness of integrated information sources
Information Systems - Special issue: Data quality in cooperative information systems
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XML conceptual modeling using UML
ER'00 Proceedings of the 19th international conference on Conceptual modeling
Rule mining for automatic ontology based data cleaning
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Context-aware replacement operations for data cleaning
Proceedings of the 2011 ACM Symposium on Applied Computing
Hi-index | 0.00 |
Real data is often affected by errors and inconsistencies. Many of them depend on the fact that schemas cannot represent a sufficiently wide range of constraints. Data cleaning is the process of identifying and possibly correcting data quality problems that affect the data. Cleaning data requires to gather knowledge on the domain to which the data refer. Anyway, existing data cleaning techniques still access this knowledge as a fragmented collection of heterogenous rules and ad hoc data transformations. Furthermore, data cleaning methodologies for an important class of data based on the semistructured XML data model have not yet been proposed. In this paper we introduce the OXC framework, that offers a methodology for XML data cleaning based on a uniform representation of domain knowledge through an ontology We describe how to define XML related data quality metrics based on our domain knowledge representation, and give a definition of various metrics related to the completeness data quality dimension.