Learning probabilistic models of the Web (poster session)

Authors:
Thomas Hofmann
Affiliations:
Department of Computer Science, Box 1910, Brown University, Providence, RI
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 6
Cited 4

Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Learning Curved Multinomial Subfamilies for Natural Language Processing and Information Retrieval

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Bayesian network model for semi-structured document classification

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Link analysis ranking: algorithms, theory, and experiments

ACM Transactions on Internet Technology (TOIT)
Core algorithms in the CLEVER system

ACM Transactions on Internet Technology (TOIT)
Effective use of WordNet semantics via kernel-based learning

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the World Wide Web, myriads of hyperlinks connect documents and pages to create an unprecedented, highly complex graph structure - the Web graph. This paper presents a novel approach to learning probabilistic models of the Web, which can be used to make reliable predictions about connectivity and information content of Web documents. The proposed method is a probabilistic dimension reduction technique which recasts and unites Latent Semantic Analysis and Kleinberg's Hubs-and-Authorities algorithm in a statistical setting.This meant to be a first step towards the development of a statistical foundation for Web—related information technologies. Although this paper does not focus on a particular application, a variety of algorithms operating in the Web/Internet environment can take advantage of the presented techniques, including search engines, Web crawlers, and information agent systems.