Learning to summarise XML documents using content and structure

  • Authors:
  • Massih R. Amini;Anastasios Tombros;Nicolas Usunier;Mounia Lalmas;Patrick Gallinari

  • Affiliations:
  • University Pierre and Marie Curie, Paris, France;Queen Mary, University of London, London, United Kingdom;University Pierre and Marie Curie, Paris, France;Queen Mary, University of London, London, United Kingdom;University Pierre and Marie Curie, Paris, France

  • Venue:
  • Proceedings of the 14th ACM international conference on Information and knowledge management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Documents formatted in eXtensible Markup Language (XML) are becoming increasingly available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning like, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX dataset. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents.