A Case-Based Transformation from HTML to XML

  • Authors:
  • Masayuki Umehara;Koji Iwanuma

  • Affiliations:
  • -;-

  • Venue:
  • IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, a huge quantity of HTML documents have been created in Internet, which really constitute a treasury of information. HTML, however, is designed mainly for reading with browsers, and not suitable for machine processing, whereas XML was proposed as a solution for this problem. In this paper, we give a case-based transformation method from HTML documents to XML ones. There are many series of HTML pages in actual Web sites, and each page of a series usually has a quite similar structure with each other. Therefore a case-based transformation must be a promising method in practice for a semi-automatic transformation from HTML to XML. Throughout experimental evaluations, we show this case-based method achieved a highly accurate transformation, i.e., 85% of actual 80 pages can be transformed in a correct way, with this case-based method.