Visual features in genre classification of html

  • Authors:
  • Ryan Levering;Michal Cutler;Lei Yu

  • Affiliations:
  • SUNY at Binghamton, Binghamton, NY;SUNY at Binghamton, Binghamton, NY;SUNY at Binghamton, Binghamton, NY

  • Venue:
  • Proceedings of the eighteenth conference on Hypertext and hypermedia
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic genre classification historically has focused on extracting textual features from documents. In this research, we investigate whether visual features of HTML documents can improve the classification of fine grained genres. Three different sets of features were compared on a genre classification task in the e-commerce domain - one with just textual features, one with HTML features added, and a third with additional visual features. Our experiments show that adding HTML and visual features provides much better classification than textual features alone.