Detection of layout-purpose TABLE tags based on machine learning

  • Authors:
  • Hidehiko Okada;Taiki Miura

  • Affiliations:
  • Kyoto Sangyo University, Kyoto, Japan;Kyoto Sangyo University, Kyoto, Japan

  • Venue:
  • UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

To make webpages more accessible to people with disabilities, 〈table〉 tags should not be used as a means to layout document content. Therefore, to evaluate the accessibility of webpages, it should be checked whether the pages include layout-purpose 〈table〉 tags. Automated precise detection of layout-purpose 〈table〉 tags in HTML sources is still a research challenge because it requires further than simply checking whether specific tags and/or attributes of the tags are included in the sources. We propose a method for the detection that is based on machine learning. The proposed method derives a 〈table〉 tag classifier that deduces the purpose of a 〈table〉 tag: the classifier deduces whether a 〈table〉 tag is a layout-purpose one or a table-purpose one. We have developed a system that derives classification rules by ID3. The system derives a decision tree from a set of learning data (〈table〉 tags of which the purposes are known) and classifies 〈table〉 tags in webpages under evaluation. Classification accuracy was evaluated by cross validation with 200 test data collected from the Web. Result of the evaluation revealed that 1) the tags can be roughly classified with attribute values of border, number of rows, number of tags that appear ahead of the 〈table〉 tag, and the nest of 〈table〉 tags (i.e., these attributes are more likely to appear in upper layers in decision trees), and 2) the accuracy rates are about 90% for the 200 test data.