Computational linkuistics: word triggers across hyperlinks

Authors:
Dragomir R. Radev;Hong Qi;Daniel Tam;Adam Winkel
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 5
Cited 0

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p2

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is known that context words tend to be self-triggers, that is, the probability of a content word to appear more than once in a document, given that it already appears once, is significantly higher than the probability of the first occurrence. We look at self-triggerability across hyperlinks on the Web. We show that the probability of a word wj to appear in a Web document di depends on the presence of wj in documents pointing to di. In Document Modeling, we will propose the use of a correction factor, R, which indicates how much more likely a word is to appear in a document given that another document containing the same word is linked to it.