A practical approach to extracting DTD-conforming XML documents from heterogeneous data sources

  • Authors:
  • Shyh-Kwei Chen;Ming-Ling Lo;Kun-Lung Wu;Jih-Shyr Yih;Colleen Viehrig

  • Affiliations:
  • IBM, T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York, NY 10532, United States;IBM, T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York, NY 10532, United States;IBM, T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York, NY 10532, United States;IBM, T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York, NY 10532, United States;IBM, T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, New York, NY 10532, United States

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.07

Visualization

Abstract

XML documents are becoming popular for business process integration. To achieve interoperability between applications, XML documents must also conform to various commonly used data type definitions (DTDs). However, most business data are not maintained as XML documents. They are stored in various native formats, such as database tables or LDAP directories. Hence, a middleware is needed to dynamically generate XML documents conforming to predefined DTDs from various data sources. As industrial consortia and large corporations have created various DTDs, it is both challenging and time-consuming to design the necessary middleware to conform to so many different DTDs. This problem is particularly acute for a small- or medium-sized enterprise because it lacks the IT skills to quickly develop such a middleware. In this paper, we present XLE, an XML Lightweight Extractor, as a practical approach to dynamically extracting DTD-conforming XML documents from heterogeneous data sources. XLE is based on a framework called DTD source annotation (DTDSA). It treats a DTD as the control structure of a program. The annotations become the program statements, such as functions and assignments. DTD-conforming XML documents are generated by parsing annotated DTDs. Basically, DTD annotations describe declaratively the mappings between target XML documents and the source data. The XLE engine implements a few basic annotations, providing a practical solution for many small- and medium-sized enterprises. However, XLE is designed to be versatile. It allows sophisticated users to plug in their own implementations to access new types of data or to achieve better performance. Heterogeneous data sources can be simply specified in the annotations. A GUI tool is provided to highlight the places where annotations are needed.