Impact of linguistic analysis on the semantic graph coverage and learning of document extracts

Authors:
Jure Leskovec;Natasa Milic-Frayling;Marko Grobelnik
Affiliations:
Carnegie Mellon University, Pittsburgh, Pennsylvania and Jozef Stefan Institute, Ljubljana, Slovenia;Microsoft Research Ltd., Cambridge, United Kingdom;Jozef Stefan Institute, Ljubljana, Slovenia
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Year:
2005

Citing 4
Cited 6

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning of generic and user-focused summarization

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Graph-based ranking algorithms for sentence extraction, applied to text summarization

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions

Web projections: learning from contextual subgraphs of the web

Proceedings of the 16th international conference on World Wide Web
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
An Ontology-Based Approach to Text Summarization

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A new approach for multi-document update summarization

Journal of Computer Science and Technology
Automatic summarization

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts of ACL 2011

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we explore how the Support Vector Machines (SVM) learning method is affected by the quality of linguistic analyses and the corresponding semantic graph representations. We apply two types of linguistic analysis: (1) a simple part-of-speech tagging of noun phrases and verbs and (2) full logical form analysis which identifies Subject-Predicate-Object triples, and then build the semantic graphs. We train the SVM classifier to identify summary nodes and use these nodes to extract sentences. Experiments with the DUC 2002 and CAST datasets show that the SVM based extraction of sentences does not differ significantly for the simple and the sophisticated syntactic analysis. In both cases the graph attributes used in learning are essential for the classifier performance and the quality of extracted summaries.