Integrating unnormalised semi-structured data sources

  • Authors:
  • Sasivimol Kittivoravitkul;Peter McBrien

  • Affiliations:
  • Department of Computing, Imperial College London, London;Department of Computing, Imperial College London, London

  • Venue:
  • CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Semi-structured data sources, such as XML, HTML or CSV files, present special problems when performing data integration. In addition to the hierarchical structure of the semistructured data, the data integration must deal with the redundancy in semi-structured data, where the same fact may be repeated in a data source, but should map into a single fact in a global integrated schema. We term semi-structured data containing such redundancy as being an unnormalised data source, and we define a normal form for semi-structured data that may be used when defining global schemas. We introduce special functions to relate object identifiers used in the global data model to object identifiers in unnormalised data sources, and demonstrate how to use these functions in query processing, update processing and integration of these data sources.