SegGen: a genetic algorithm for linear text segmentation

  • Authors:
  • S. Lamprier;T. Amghar;B. Levrat;F. Saubion

  • Affiliations:
  • LERIA, Université d'Angers, Angers, France;LERIA, Université d'Angers, Angers, France;LERIA, Université d'Angers, Angers, France;LERIA, Université d'Angers, Angers, France

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes SegGen, a new algorithm for linear text segmentation on general corpuses. It aims to segment texts into thematic homogeneous parts. Several existing methods have been used for this purpose, based on a sequential creation of boundaries. Here, we propose to consider boundaries simultaneously thanks to a genetic algorithm. SegGen uses two criteria: maximization of the internal cohesion of the formed segments and minimization of the similarity of the adjacent segments. First experimental results are promising and SegGen appears to be very competitive compared with existing methods.