Summarisation of the logical structure of XML documents

Authors:
ZoltáN SzláVik;Anastasios Tombros;Mounia Lalmas
Affiliations:
Department of Computer Science, VU University Amsterdam, 1081 HV Amsterdam, The Netherlands;School of Electronic Engineering and Computer Science, Queen Mary University of London, E1 4NS London, United Kingdom;Yahoo! Research Barcelona, Avinguda Diagonal 177, 08018 Barcelona, Spain
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 27
Cited 0

Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
TileBars: visualization of term distribution information in full text information access

CHI '95 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Wrappers for performance enhancement and oblivious decision graphs

Wrappers for performance enhancement and oblivious decision graphs
WebTOC: a tool to visualize and quantify Web sites using a hierarchical table of contents browser

CHI 98 Cconference Summary on Human Factors in Computing Systems
Advantages of query biased summaries in information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Language models for hierarchical summarization

Language models for hierarchical summarization
Length normalization in XML retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Manual and automatic evaluation of summaries

AS '02 Proceedings of the ACL-02 Workshop on Automatic Summarization - Volume 4
Investigating the use of summarisation for interactive XML retrieval

Proceedings of the 2006 ACM symposium on Applied computing
The Wikipedia XML corpus

ACM SIGIR Forum
Users, structured documents and overlap: interactive searching of elements and the influence of context on search behaviour

IIiX Proceedings of the 1st international conference on Information interaction in context
Evaluating XML retrieval effectiveness at INEX

ACM SIGIR Forum
Report on the INEX 2005 interactive track

ACM SIGIR Forum
Evaluating relevant in context: document retrieval with a twist

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
GA, MR, FFNN, PNN and GMM based models for automatic text summarization

Computer Speech and Language
Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers

Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007 Dagstuhl Castle, Germany, December 17-19, 2007. Selected Papers
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
The automatic creation of literature abstracts

IBM Journal of Research and Development
Feature- and query-based table of contents generation for XML documents

ECIR'07 Proceedings of the 29th European conference on IR research
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Overview of INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
The use of summaries in XML retrieval

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
Designing a user interface for interactive retrieval of structured documents — lessons learned from the INEX interactive track

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Summarisation is traditionally used to produce summaries of the textual contents of documents. In this paper, it is argued that summarisation methods can also be applied to the logical structure of XML documents. Structure summarisation selects the most important elements of the logical structure and ensures that the user's attention is focused towards sections, subsections, etc. that are believed to be of particular interest. Structure summaries are shown to users as hierarchical tables of contents. This paper discusses methods for structure summarisation that use various features of XML elements in order to select document portions that a user's attention should be focused to. An evaluation methodology for structure summarisation is also introduced and summarisation results using various summariser versions are presented and compared to one another. We show that data sets used in information retrieval evaluation can be used effectively in order to produce high quality (query independent) structure summaries. We also discuss the choice and effectiveness of particular summariser features with respect to several evaluation measures.