Integrating heterogeneous data warehouses using XML technologies

  • Authors:
  • Frank S.C. Tseng;Chia-Wei Chen

  • Affiliations:
  • Department of Information Management, National Kaohsiung First University of Science and Technology, Kaohsiung, Taiwan, ROC;Department of Information Management, National Kaohsiung First University of Science and Technology, Kaohsiung, Taiwan, ROC

  • Venue:
  • Journal of Information Science
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data warehousing has been widely adopted by contemporary enterprises. For inter-organizational information sharing, the need cannot be over-emphasized to conduct researches on the integration of heterogeneous data warehouses to overcome the challenging situations today. That makes it urgent to establish a systematic integration methodology for integrating heterogeneous data warehouses via the Internet or proprietary extranets. Traditionally, researchers usually employed a canonical format as the integration medium for logical data integrations among heterogeneous systems. In this paper, to fully utilize the power of the Internet, we propose a framework and develop a prototype to integrate heterogeneous data warehouses by XML technologies. We first formally define the elements in data warehousing and discuss various semantic conflicts occurring among heterogeneous data cubes. Then, we propose the system architecture and related resolution procedures for all kinds of semantic conflicts. For local data cubes with different schemas, we define a global XML Schema to integrate the local cube structures, and transform each local cube respectively into an XML document conforming to the global XML Schema. These transformed XML documents obtained from local cubes will be manipulated by pre-defined XQuery commands to form a unified XML document, which can be regarded as the global cube. The integrated global cube can be easily stored and manipulated in native XML databases. The proposed methodology enables global users to browse or pose multi-dimensional expressions (MDX) on the global cube to obtain a result in the same way as they perform locally.