Transforming web pages to become standard-compliant through reverse engineering

  • Authors:
  • Benfeng Chen;Vincent Y. Shen

  • Affiliations:
  • Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong;Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

  • Venue:
  • W4A '06 Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Developing Web pages following established standards can make the information more accessible, their rendering more efficient, and their processing by computer applications easier. Unfortunately, more than 95% of the existing Web pages today are not "valid" in that they do not follow some of the recommendations (standards) of the World Wide Web Consortium (W3C). Fixing any Web page to make it standard-compliant is a major undertaking. There is now an open-source tool called HTML Tidy which will attempt to fix the invalid HTML code automatically. However, Tidy often changes the Web page's appearance after processing. It is not an effective tool to transform existing Web pages to make them standard-compliant.In this paper we report the design and implementation of PURE, a tool that cleans up an HTML document through reverse engineering. PURE starts with the rendering result of a given Web page and generates valid HTML code and CSS automatically to produce the same appearance. It is found to be effective for many existing Web pages. A prototype is now available for public testing and comments.