Entity identification in database integration
Information Sciences: an International Journal
World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Maintaining data warehouses over changing information sources
Communications of the ACM
Storing and querying ordered XML using a relational database system
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The Nimble XML Data Integration System
Proceedings of the 17th International Conference on Data Engineering
Change-Centric Management of Versions in an XML Warehouse
Proceedings of the 27th International Conference on Very Large Data Bases
XML Data Warehouse: Modelling and Querying
Proceedings of the Baltic Conference, BalticDB&IS 2002 - Volume 1
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Incremental maintenance of path-expression views
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey of data provenance in e-science
ACM SIGMOD Record
PATAXÓ: A framework to allow updates through XML views
ACM Transactions on Database Systems (TODS)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Propagating XML constraints to relations
Journal of Computer and System Sciences
XML data integration with identification
DBPL'05 Proceedings of the 10th international conference on Database Programming Languages
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Hi-index | 0.00 |
There are two major problems for merging instances from different sources in order to build a datawarehouse: entity identification ambiguity and attribute value conflict. In this paper we propose a data model that facilitates the resolution of value attribute conflicts by explicitly representing them in the integrated schema. In this model, the datawarehouse is an XML tree populated with data imported from one or more XML sources, and nodes are annotated with provenance information. The purpose of annotations is twofold: first, they represent the origin of every element in the datawarehouse. This information is essential for determining the quality and amount of trust one places on the data. Second, they allow the portion of source XML tree used to populate the warehouse to be reconstructed. This capability is important if one needs the original document to compare with new releases from the same source in order to incrementally update the warehouse. Algorithms for populating the warehouse according to the proposed model and for reconstructing the source data are presented. We also report results from an experimental study conducted to determine the impact of the annotations on the size of the warehouse.