A critique and improvement of an evaluation metric for text segmentation

  • Authors:
  • Lev Pevzner;Marti A. Hearst

  • Affiliations:
  • Harvard University, 380 Leverett Mail Center, Cambridge, MA;University of California, Berkeley 102 South Hall #4600, Berkeley, CA

  • Venue:
  • Computational Linguistics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the Pk metric that remedies these problems. This new metric-called WindowDiff-moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.