View inference for heterogeneous XML information integration

  • Authors:
  • Euna Jeong;Chun-Nan Hsu

  • Affiliations:
  • School of Computer Science and Engineering, Inha University, 253 YounHyun-Dong Nam-Gu Incheon 402-751, South Korea;Institute of Information Science, Academia Sinica, Nankang 115 Taipei, Taiwan

  • Venue:
  • Journal of Intelligent Information Systems - Special issue on web intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper proposes a novel approach to integrating heterogeneous XML DTDs. With this approach, an information agent can be easily extended to integrate heterogeneous XML-based contents and perform federated search. Based on a tree grammar inference technique, this approach derives an integrated view of XML DTDs in an information integration framework. The derivation takes advantages of naming and structural similarities among DTDs in similar domains. The complete approach consists of three main steps. (1) DTD clustering clusters DTDs in similar domains into classes. (2) Schema learner applies a tree grammar inference technique to generate a set of tree grammar rules from the DTDs in a class from the previous step. (3) Minimizer optimizes the rules generated in the previous step, transforms them into an integrated view, and generates source descriptions. We have implemented the approach into a system called DEEP and tested the system on several domains. Experimental results reveal that this system can effectively and efficiently integrate radically different DTDs.