A Translation from the HTML DTD into a Regular Hedge Grammar

  • Authors:
  • Takuya Nishiyama;Yasuhiko Minamide

  • Affiliations:
  • Department of Computer Science, University of Tsukuba,;Department of Computer Science, University of Tsukuba,

  • Venue:
  • CIAA '08 Proceedings of the 13th international conference on Implementation and Applications of Automata
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The PHP string analyzer developed by the second author approximates the string output of a program with a context-free grammar. By developing a procedure to decide inclusion between context-free and regular hedge languages, Minamide and Tozawa applied this analysis to checking the validity of dynamically generated XHTML documents. In this paper, we consider the problem of checking the validity of dynamically generated HTML documents instead of XHTML documents.HTML is not specified by an XML schema language, but by an SGML DTD, and we can omit several kinds of tags in HTML documents. We formalize a subclass of SGML DTDs and develop a translation into regular hedge grammars. Thus we can validate dynamically generated HTML documents. We have implemented this translation and incorporated it in the PHP string analyzer. The experimental results show that the validation through this translation works well in practice.