A model for XML instance level integration

Authors:
Aldo Monteiro do Nascimento;Carmem S. Hara
Affiliations:
Universidade Federal do Paraná (UFPR), Curitiba -- PR -- Brasil;Universidade Federal do Paraná (UFPR), Curitiba -- PR -- Brasil
Venue:
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Year:
2008

Citing 14
Cited 1

Entity identification in database integration

Information Sciences: an International Journal
Extensible markup language

World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Maintaining data warehouses over changing information sources

Communications of the ACM
Storing and querying ordered XML using a relational database system

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The Nimble XML Data Integration System

Proceedings of the 17th International Conference on Data Engineering
Change-Centric Management of Versions in an XML Warehouse

Proceedings of the 27th International Conference on Very Large Data Bases
XML Data Warehouse: Modelling and Querying

Proceedings of the Baltic Conference, BalticDB&IS 2002 - Volume 1
ORDPATHs: insert-friendly XML node labels

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Incremental maintenance of path-expression views

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey of data provenance in e-science

ACM SIGMOD Record
PATAXÓ: A framework to allow updates through XML views

ACM Transactions on Database Systems (TODS)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)

Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Propagating XML constraints to relations

Journal of Computer and System Sciences
XML data integration with identification

DBPL'05 Proceedings of the 10th international conference on Database Programming Languages

XML data fusion

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are two major problems for merging instances from different sources in order to build a datawarehouse: entity identification ambiguity and attribute value conflict. In this paper we propose a data model that facilitates the resolution of value attribute conflicts by explicitly representing them in the integrated schema. In this model, the datawarehouse is an XML tree populated with data imported from one or more XML sources, and nodes are annotated with provenance information. The purpose of annotations is twofold: first, they represent the origin of every element in the datawarehouse. This information is essential for determining the quality and amount of trust one places on the data. Second, they allow the portion of source XML tree used to populate the warehouse to be reconstructed. This capability is important if one needs the original document to compare with new releases from the same source in order to incrementally update the warehouse. Algorithms for populating the warehouse according to the proposed model and for reconstructing the source data are presented. We also report results from an experimental study conducted to determine the impact of the annotations on the size of the warehouse.