Information retrieval from distributed semistructured documents using metadata interface

  • Authors:
  • Guija Choe;Young-Kwang Nam;Joseph Goguen;Guilian Wang

  • Affiliations:
  • Department of Computer Science, Yonsei University, Wonju, Korea;Department of Computer Science, Yonsei University, Wonju, Korea;Department of Computer Science and Engineering, UCSD, La Jolla, CA;Department of Computer Science and Engineering, UCSD, La Jolla, CA

  • Venue:
  • KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a method for retrieving information from distributed heterogeneous semistructured documents, and its implementation in the metadata interface DDXMI (Distributed Document XML Metadata Interface). The system generates local queries appropriate for local schemas from a user query over the global schema and shows the result of the generated queries. The three components are designed to generate the local queries: mappings between global schema and local schemas (extracted from local documents if not given), path substitution, and node identification for resolving the heterogeneity among nodes with the same label that often exist in semistructured data. The system uses Quilt as its XML query language. An experiment is reported over three local semistructured documents: ‘thesis', ‘reports', and ‘journal' documents with ‘article' global schema. The prototype was developed under Windows system with Java and JavaCC.