Segmentation and detection at IBM: hybrid statistical models and two-tiered clustering

  • Authors:
  • S. Dharanipragada;M. Franz;J. S. McCarley;T. Ward;W.-J. Zhu

  • Affiliations:
  • IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY;IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY

  • Venue:
  • Topic detection and tracking
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

IBM's story segmentation uses a combination of decision tree and maximum entropy models. They take a variety of lexical, prosodic, semantic, and structural features as their inputs. Both types of models are source-specific, and we substantially lower Cseg by combining them. IBM's topic detection system introduces a minimal hierarchy into the clustering: each cluster is comprised of one or more microclusters. We investigate the importance of merging microclusters together, and propose a merging strategy which improves our performance.