Learning to summarise XML documents using content and structure

Authors:
Massih R. Amini;Anastasios Tombros;Nicolas Usunier;Mounia Lalmas;Patrick Gallinari
Affiliations:
University Pierre and Marie Curie, Paris, France;Queen Mary, University of London, London, United Kingdom;University Pierre and Marie Curie, Paris, France;Queen Mary, University of London, London, United Kingdom;University Pierre and Marie Curie, Paris, France
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 4
Cited 3

Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research

Investigating the use of summarisation for interactive XML retrieval

Proceedings of the 2006 ACM symposium on Applied computing
The use of summaries in XML retrieval

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Structured text retrieval by means of affordances and genre

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access

Quantified Score

Hi-index	0.00

Visualization

Abstract

Documents formatted in eXtensible Markup Language (XML) are becoming increasingly available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning like, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX dataset. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents.