Extracting Information from XML Documents by Reverse Generating a DTD

  • Authors:
  • Jong-Seok Jung;Dong-Ik Oh;Yong-Hae Kong;Jong-Keun Ahn

  • Affiliations:
  • -;-;-;-

  • Venue:
  • EurAsia-ICT '02 Proceedings of the First EurAsian Conference on Information and Communication Technology
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information contained in XML documents cannot properly be interpreted without an appropriate DTD. However, XML documents collected from the web may not always be accompanied by the corresponding DTD, so that extracting information from such sources may not be easy. In this study, we reverse construct a DTD from DTD-unknown XML sources, and use it to extract information from XML inputs. The DTD construction module developed is designed to scan input XML files in 1-path, where most other implementations use 2-path approach. Developed modules provide clean Java programming interfaces as well, so that it can be integrated with other web applications seamlessly.