ClassStruggle: a clustering based text segmentation

  • Authors:
  • Sylvain Lamprier;Tassadit Amghar;Bernard Levrat;Frederic Saubion

  • Affiliations:
  • Université Angers, Angers, France;Université Angers, Angers, France;Université Angers, Angers, France;Université Angers, Angers, France

  • Venue:
  • Proceedings of the 2007 ACM symposium on Applied computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes ClassStruggle, an algorithm for linear text segmentation on general corpuses. It relies on an initial clustering of the sentences of the text. This preliminary partitioning provides a global view on the sentences relations existing in the text, considering the similarities in a group rather than individually. ClassStruggle is based on the distribution of the occurrences of the members of each class. During the process, the clusters then evolve, by considering a notion of proximity and of layout in the text, in the aim to create groups that contain only sentences related to a same topic development. Finally, boundaries are created between sentences belonging to two different classes. First experimental results are promising, ClassStruggle appears to be very competitive compared with existing methods.