Automated conversion of table-based websites to structured stylesheets using table recognition and clone detection

  • Authors:
  • Andy Y. Mao;James R. Cordy;Thomas R. Dean

  • Affiliations:
  • Queen's University, Kingston, Ontario, Canada;Queen's University, Kingston, Ontario, Canada;Queen's University, Kingston, Ontario, Canada

  • Venue:
  • CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web standards such as XHTML and CSS are rapidly coming into practice and have many advantages, including compatibility, consistency across browsers, and increased ease of maintenance. Unfortunately large numbers of existing websites still use the deprecated table-based layout style in which page style is unique to each page. Existing tools for automating the transition to stylesheets provide little help, converting page-by-page using a flattened structure and local inline styles rather than a common CSS stylesheet. This approach ignores hierarchical structure and defeats the main purpose of moving to the new standard, losing all of the advantages. In this work we present an automated method for converting table-based layout websites to standards-compliant modern CSS stylesheet-based websites using a two-step process. Pages of the site are first converted page-by-page using table recognition technology to preserve hierarchical structure and layout semantics in local styles. Software clone detection technology is then utilized to recognize common layout styles in the pages and extract and minimize them to a common CSS stylesheet for the site. The result is a maintainable, efficient modern standards-compliant website with the same look and feel as the original but with all the maintenance advantages of a custom programmed new site.