Capturing term dependencies using a language model based on sentence trees

  • Authors:
  • Ramesh Nallapati;James Allan

  • Affiliations:
  • University of Massachusetts, Amherst, MA;University of Massachusetts, Amherst, MA

  • Venue:
  • Proceedings of the eleventh international conference on Information and knowledge management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a new probabilistic Sentence Tree Language Modeling approach that captures term dependency patterns in Topic Detection and Tracking's (TDT) Story Link Detection task. New features of the approach include modeling the syntactic structure of sentences in documents by a sentence-bin approach and a computationally efficient algorithm for capturing the most significant sentence-level term dependencies using a Maximum Spanning Tree approach, similar to Van Rijsbergen's modeling of document-level term dependencies.The new model is a good discriminator of on-topic and off-topic story pairs providing evidence that sentence-level term dependencies contain significant information about relevance. Although runs on a subset of the TDT2 corpus show that the model is outperformed by the unigram language model, a mixture of the unigram and the Sentence Tree models is shown to improve on the best performance especially in the regions of low false alarms.