Incremental data integration based on hierarchical metadata registry with data visibility

  • Authors:
  • Dongwon Jeong;Doo-Kwon Baik

  • Affiliations:
  • Software System Laboratory, Department of Computer Science & Engineering, Korea University, 1, 5-ka, Anam-dong, Sungbuk-ku, Seoul 136-701, South Korea;Software System Laboratory, Department of Computer Science & Engineering, Korea University, 1, 5-ka, Anam-dong, Sungbuk-ku, Seoul 136-701, South Korea

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.07

Visualization

Abstract

A considerable number of researches have been studied on data integration based on metadata. However, existing approaches require too much cost to build an initial guideline. Most important reason is that the previous researches have not seriously considered the corresponding domain properties such as the data level and the user level. First, it is difficult in practice to create a standardized guideline on the entire data set, if there is a restricted cost given. Thus, a set of data to be integrated should be selected first. However, most databases have no statistical information that may be used to select such a set of data according to its usability. In this paper, we propose LOG (localization-based global metadata registry) methodology to build a guideline and integrate databases progressively considering the domain properties. The key idea is that the priorities of databases to be integrated are determined by the relationship to the domain properties. We also show the implementation by applying it to actual databases in Korea Institute of Science and Technology Information, which builds and manages a considerable number of databases on the science and technology in Korea. The LOG provides an incremental build method of metadata registry, and also supports progressive data integration mechanism on the existing distributed databases. It especially gives successful and efficient output on the creation of a standard guideline in the situation where the given cost is restricted.