Flexible intrinsic evaluation of hierarchical clustering for TDT

  • Authors:
  • James Allan;Ao Feng;Alvaro Bolivar

  • Affiliations:
  • University of Massachusetts, MA;University of Massachusetts, MA;University of Massachusetts, MA

  • Venue:
  • CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Topic Detection and Tracking (TDT) evaluation program has included a "cluster detection" task since its inception in 1996. Systems were required to process a stream of broadcast news stories and partition them into non-overlapping clusters. A system's effectiveness was measured by comparing the generated clusters to "truth" clusters created by human annotators. Starting in 2003, TDT is moving to a more realistic model that permits overlapping clusters (stories may be on more than one topic) and encourages the creation of a hierarchy to structure the relationships between clusters (topics). We explore a range of possible evaluation models for this modified TDT clustering task to understand the best approach for mapping between the human-generated "truth" clusters and a much richer hierarchical structure. We demonstrate that some obvious evaluation techniques fail for degenerate cases. For a few others we attempt to develop an intuitive sense of what the evaluation numbers mean. We settle on some approaches that incorporate a strong balance between cluster errors (misses and false alarms) and the distance it takes to travel between stories within the hierarchy.