New Feature Sets for Summarization by Sentence Extraction

  • Authors:
  • Hans van Halteren

  • Affiliations:
  • -

  • Venue:
  • IEEE Intelligent Systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Machine learning feature sets that were originally developed for authorship attribution can be used for summarization by sentence extraction. In the author's pilot experiment, these feature sets distinguished significantly better between extract and nonextract sentences than a random baseline classifier, but it had to be carefully combined with other features to outperform a positional baseline classifier. In the DUC 2002 competition, an actual combination system trained on 400-word single document extracts was one of the best performers in the 200- and 400-word multidocument extraction task. Further experiments showed that this system could be improved significantly with training material that better reflected the intended task.