Using ontologies for XML data cleaning

Authors:
Diego Milano;Monica Scannapieco;Tiziana Catarci
Affiliations:
Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Roma “La Sapienza”, Roma, Italy;Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Roma “La Sapienza”, Roma, Italy;Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Roma “La Sapienza”, Roma, Italy
Venue:
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Year:
2005

Citing 9
Cited 3

A product perspective on total data quality management

Communications of the ACM
A knowledge-based approach for duplicate elimination in data cleaning

Information Systems - Data extraction, cleaning and reconciliation
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
An Extensible Framework for Data Cleaning

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A normal form for XML documents

ACM Transactions on Database Systems (TODS)
Colorful XML: one hierarchy isn't enough

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Completeness of integrated information sources

Information Systems - Special issue: Data quality in cooperative information systems
DogmatiX tracks down duplicates in XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XML conceptual modeling using UML

ER'00 Proceedings of the 19th international conference on Conceptual modeling

Rule mining for automatic ontology based data cleaning

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
XML data fusion

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Context-aware replacement operations for data cleaning

Proceedings of the 2011 ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real data is often affected by errors and inconsistencies. Many of them depend on the fact that schemas cannot represent a sufficiently wide range of constraints. Data cleaning is the process of identifying and possibly correcting data quality problems that affect the data. Cleaning data requires to gather knowledge on the domain to which the data refer. Anyway, existing data cleaning techniques still access this knowledge as a fragmented collection of heterogenous rules and ad hoc data transformations. Furthermore, data cleaning methodologies for an important class of data based on the semistructured XML data model have not yet been proposed. In this paper we introduce the OXC framework, that offers a methodology for XML data cleaning based on a uniform representation of domain knowledge through an ontology We describe how to define XML related data quality metrics based on our domain knowledge representation, and give a definition of various metrics related to the completeness data quality dimension.