Web-centric language models

Authors:
Jaap Kamps
Affiliations:
University of Amsterdam, Amsterdam, The Netherlands
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 2
Cited 10

The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Combining document representations for known-item search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

An analysis on document length retrieval trends in language modeling smoothing

Information Retrieval
Assessing multivariate Bernoulli models for information retrieval

ACM Transactions on Information Systems (TOIS)
Revisiting the relationship between document length and relevance

Proceedings of the 17th ACM conference on Information and knowledge management
Is Wikipedia link structure different?

Proceedings of the Second ACM International Conference on Web Search and Data Mining
The importance of link evidence in Wikipedia

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
The importance of anchor text for ad hoc search revisited

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Combination methods for crosslingual web retrieval

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Using anchor text for homepage and topic distillation search tasks

Journal of the American Society for Information Science and Technology
Focused retrieval using topical language and structure

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Multilingual web retrieval experiments with field specific indexing strategies for WebCLEF 2006 at the University of Hildesheim

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perform an analysis of prior probability of relevance for a wide range of non-content features, shedding further light on the importance of non-content features for web retrieval. This directly explains the success or failure of various techniques, e.g., why the link topology is particularly helpful to single out important sites. Language models can naturally incorporate multiple document representations, as well as non-content information. For the former, we employ mixture language models based on document full-text, incoming anchor-text, and document titles. For the latter, we study a range of priors based on document length, URL structure, and link topology. We look at three types of topics--distillation, home page, and named page--as well as for a mixed query set. We find that the mixture models lead to considerable improvement of retrieval effectiveness for all topic types. The web-centric priors generally lead to further improvement of retrieval effectiveness.