Structured multimedia document classification

  • Authors:
  • Ludovic Denoyer;Jean-Noël Vittaut;Patrick Gallinari;Sylvie Brunessaux;Stephan Brunessaux

  • Affiliations:
  • University of Paris 6, Paris -- France;University of Paris 6, Paris -- France;University of Paris 6, Paris -- France;EADS S&DE, Val de Reuil -- France;EADS S&DE, Val de Reuil -- France

  • Venue:
  • Proceedings of the 2003 ACM symposium on Document engineering
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn sites from 8 European languages. This corpus has been developed by EADS company in the context of a large Web site filtering application.