Learning-based summarisation of XML documents

Authors:
Massih R. Amini;Anastasios Tombros;Nicolas Usunier;Mounia Lalmas
Affiliations:
University Pierre and Marie Curie, Paris, France 75015;Department of Computer Science, Queen Mary, University of London, London, United Kingdom E1 4NS;University Pierre and Marie Curie, Paris, France 75015;Department of Computer Science, Queen Mary, University of London, London, United Kingdom E1 4NS
Venue:
Information Retrieval
Year:
2007

Citing 23
Cited 1

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query expansion using local and global document analysis

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Extracting sentence segments for text summarization: a machine learning approach

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Applying summarization techniques for term selection in relevance feedback

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generic summaries for indexing in information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The Challenges of Automatic Summarization

Computer
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
Structured and Unstructured Document Summarization: Design of a Commercial Summarizer using Lexical Chains

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Semantic thumbnails: a novel method for summarizing document collections

Proceedings of the 22nd annual international conference on Design of communication: The engineering of quality documentation
Summarizing technical support documents for search: expert and user studies

IBM Systems Journal
Extracting important sentences with support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Ranking and Reranking with Perceptron

Machine Learning
Investigating the use of summarisation for interactive XML retrieval

Proceedings of the 2006 ACM symposium on Applied computing
The use of summaries in XML retrieval

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Clustering XML documents using structural summaries

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Automatic text summarization based on word-clusters and ranking algorithms

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Hierarchical fuzzy feature similarity combination for presentation slide retrieval

EURASIP Journal on Advances in Signal Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.