Using ontologies for XML data cleaning

  • Authors:
  • Diego Milano;Monica Scannapieco;Tiziana Catarci

  • Affiliations:
  • Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Roma “La Sapienza”, Roma, Italy;Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Roma “La Sapienza”, Roma, Italy;Dipartimento di Informatica e Sistemistica, Universitá degli Studi di Roma “La Sapienza”, Roma, Italy

  • Venue:
  • OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real data is often affected by errors and inconsistencies. Many of them depend on the fact that schemas cannot represent a sufficiently wide range of constraints. Data cleaning is the process of identifying and possibly correcting data quality problems that affect the data. Cleaning data requires to gather knowledge on the domain to which the data refer. Anyway, existing data cleaning techniques still access this knowledge as a fragmented collection of heterogenous rules and ad hoc data transformations. Furthermore, data cleaning methodologies for an important class of data based on the semistructured XML data model have not yet been proposed. In this paper we introduce the OXC framework, that offers a methodology for XML data cleaning based on a uniform representation of domain knowledge through an ontology We describe how to define XML related data quality metrics based on our domain knowledge representation, and give a definition of various metrics related to the completeness data quality dimension.