Automatically labeling hierarchical clusters

  • Authors:
  • Pucktada Treeratpituk;Jamie Callan

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • dg.o '06 Proceedings of the 2006 international conference on Digital government research
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Government agencies must often quickly organize and analyze large amounts of textual information, for example comments received as part of notice and comment rulemaking. Hierarchical organization is popular because it represents information at different levels of detail and is convenient for interactive browsing. Good hierarchical clustering algorithms are available, but there are few good solutions for automatically labeling the nodes in a cluster hierarchy.This paper presents a simple algorithm that automatically assigns labels to hierarchical clusters. The algorithm evaluates candidate labels using information from the cluster, the parent cluster, and corpus statistics. A trainable threshold enables the algorithm to assign just a few high-quality labels to each cluster. Experiments with Open Directory Project (ODP) hierarchies indicate that the algorithm creates cluster labels that are similar to labels created by ODP editors.