Computational linkuistics: word triggers across hyperlinks

  • Authors:
  • Dragomir R. Radev;Hong Qi;Daniel Tam;Adam Winkel

  • Affiliations:
  • University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI

  • Venue:
  • HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is known that context words tend to be self-triggers, that is, the probability of a content word to appear more than once in a document, given that it already appears once, is significantly higher than the probability of the first occurrence. We look at self-triggerability across hyperlinks on the Web. We show that the probability of a word wj to appear in a Web document di depends on the presence of wj in documents pointing to di. In Document Modeling, we will propose the use of a correction factor, R, which indicates how much more likely a word is to appear in a document given that another document containing the same word is linked to it.