Statistical models of topical content

  • Authors:
  • J. P. Yamron;L. Gillick;P. van Mulbregt;S. Knecht

  • Affiliations:
  • formerly of Dragon Systems/Lernout & Hauspie, 320 Nevada Street, Newton, MA;formerly of Dragon Systems/Lernout & Hauspie, 320 Nevada Street, Newton, MA;formerly of Dragon Systems/Lernout & Hauspie, 320 Nevada Street, Newton, MA;Dragon Systems/Lernout & Hauspie, 320 Nevada Street, Newton, MA

  • Venue:
  • Topic detection and tracking
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this chapter we explore the behavior of two different statistical models, one based on simple unigrams and another based on the beta-binomial distribution, as applied to the problem of modeling story generation. We describe how these models can be incorporated into information extraction applications, particularly Tracking and Detection engines built for the Topic Detection and Tracking evaluations sponsored by DARPA. Tracking systems based on the two models have complementary strengths and weaknesses: a Beta-Binomial system yields high precision at high decision threshold, but performance quickly degrades as the threshold drops; a Unigram system is not as strong at high decision threshold, but is very good at suppressing false-alarms at lower threshold. We will describe the features of these systems that give rise to this behavior, and discuss ways that each system might be improved by borrowing from the other. We will also discuss our Detection system, and how improvements in Tracking should lead to improvements in Detection.