Web news categorization using a cross-media document graph

  • Authors:
  • José Iria;Fabio Ciravegna;João Magalhães

  • Affiliations:
  • The University of Sheffield, Sheffield, UK;The University of Sheffield, Sheffield, UK;Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal

  • Venue:
  • Proceedings of the ACM International Conference on Image and Video Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and a picture containing additional information regarding the real extent of the event (e.g., how damaged the car is) or providing evidence corroborating the text part. The framework handles multimedia information by considering not only the document's text and images data but also the layout structure which determines how a given text block is related to a particular image. The novelties and contributions of the proposed framework are: (1) support of heterogeneous types of multimedia documents; (2) a document-graph representation method; and (3) the computation of cross-media correlations. Moreover, we applied the framework to the tasks of categorising Web news feed data, and our results show a significant improvement over a single-medium based framework.