On the limits of sentence compression by deletion

Authors:
Erwin Marsi;Emiel Krahmer;Iris Hendrickx;Walter Daelemans
Affiliations:
Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands;Antwerp University, Antwerpen, Belgium;Antwerp University, Antwerpen, Belgium
Venue:
Empirical methods in natural language generation
Year:
2010

Citing 20
Cited 3

Summarization beyond sentence extraction: a probabilistic approach to sentence compression

Artificial Intelligence
Text Revision: A Model and Its Implementation

Proceedings of the 6th International Workshop on Natural Language Generation: Aspects of Automated Natural Language Generation
Discovery of inference rules for question-answering

Natural Language Engineering
Cut and paste based text summarization

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Learning non-isomorphic tree mappings for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Improving summarization performance by sentence compression: a pilot study

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Extracting structural paraphrases from aligned monolingual corpora

PARAPHRASE '03 Proceedings of the second international workshop on Paraphrasing - Volume 16
Supervised and unsupervised learning for sentence compression

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Models for sentence compression: a comparison across domains, training requirements and evaluation measures

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks

Information Processing and Management: an International Journal
Sentence compression beyond word deletion

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Sentence fusion via dependency graph compression

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Tree linearization in English: improving language model based approaches

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Global inference for sentence compression an integer linear programming approach

Journal of Artificial Intelligence Research
Sentence compression as tree transduction

Journal of Artificial Intelligence Research
A comparison of model free versus model intensive approaches to sentence compression

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Attribute selection for referring expression generation: new algorithms and evaluation methods

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Spanning tree approaches for statistical sentence generation

Empirical methods in natural language generation

Spanning tree approaches for statistical sentence generation

Empirical methods in natural language generation
Text specificity and impact on quality of news summaries

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Towards strict sentence intersection: decoding and evaluation strategies

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that this is partly due to the lack of appropriate evaluation material and estimate that a deletion model is in fact compatible with approximately 55% of the observed data. We analyse the remaining cases in which deletion only failed to provide the required level of compression. We conclude that in those cases word order changes and paraphrasing are crucial. We therefore argue for more elaborate sentence compression models which include paraphrasing and word reordering. We report preliminary results of applying a recently proposed more powerful compression model in the context of subtitling for Dutch.