Automatic annotation of corpora for text summarisation: a comparative study

Authors:
Constantin Orăsan
Affiliations:
Research Group in Computational Linguistics, School of Humanities, Languages and Social Sciences, University of Wolverhampton, Wolverhampton, UK
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 11
Cited 0

Use of genetic algorithms for query improvement in information retrieval based on a vector space model

Use of genetic algorithms for query improvement in information retrieval based on a vector space model
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering: an empirical basis for grammatical rules

Information Processing and Management: an International Journal
An introduction to genetic algorithms

An introduction to genetic algorithms
The decomposition of human-written summary sentences

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Applying genetic algorithms to pronoun resolution

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Machine Learning

Machine Learning
Enhancing Preference-Based Anaphora Resolution with Genetic Algorithms

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Optimization models of sound systems using genetic algorithms

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents two methods which automatically produce annotated corpora for text summarisation on the basis of human produced abstracts. Both methods identify a set of sentences from the document which conveys the information in the human produced abstract best. The first method relies on a greedy algorithm, whilst the second one uses a genetic algorithm. The methods allow to specify the number of sentences to be annotated, which constitutes an advantage over the existing methods. Comparison between the two approaches investigated here revealed that the genetic algorithm is appropriate in cases where the number of sentences to be annotated is less than the number of sentences in an ideal gold standard with no length restrictions, whereas the greedy algorithm should be used in other cases.