Terms derived from frequent sequences for extractive text summarization

  • Authors:
  • Yulia Ledeneva;Alexander Gelbukh;René Arnulfo García-Hernández

  • Affiliations:
  • Natural Language and Text Processing Laboratory, Center for Computing Research, National Polytechnic Institute, Mexico;Natural Language and Text Processing Laboratory, Center for Computing Research, National Polytechnic Institute, Mexico;Instituto Tecnologico de Toluca, Mexico

  • Venue:
  • CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic text summarization helps the user to quickly understand large volumes of information. We present a language- and domain-independent statistical-based method for single-document extractive summarization, i.e., to produce a text summary by extracting some sentences from the given text. We show experimentally that words that are parts of bigrams that repeat more than once in the text are good terms to describe the text's contents, and so are also so-called maximal frequent sentences. We also show that the frequency of the term as term weight gives good results (while we only count the occurrences of a term in repeating bigrams).