Exploiting site-level information to improve web search

Authors:
Andrei Broder;Evgeniy Gabrilovich;Vanja Josifovski;George Mavromatis;Donald Metzler;Jane Wang
Affiliations:
Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA;University of Southern California, Marina del Rey, CA, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 13
Cited 2

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Improving web search by the identification of contextual information

Intelligent exploration of the web
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A study of relevance propagation for web search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Smoothing document language models with probabilistic term count propagation

Information Retrieval
Beyond bags of words: effectively modeling dependence and features in information retrieval

Beyond bags of words: effectively modeling dependence and features in information retrieval
Building enriched document representations using aggregated anchor text

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management

To each his own: personalized content selection based on text comprehensibility

Proceedings of the fifth ACM international conference on Web search and data mining
Aggregating evidence from hospital departments to improve medical records search

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ranking Web search results has long evolved beyond simple bag-of-words retrieval models. Modern search engines routinely employ machine learning ranking that relies on exogenous relevance signals. Yet the majority of current methods still evaluate each Web page out of context. In this work, we introduce a novel source of relevance information for Web search by evaluating each page in the context of its host Web site. For this purpose, we devise two strategies for compactly representing entire Web sites. We formalize our approach by building two indices, a traditional page index and a new site index, where each "document" represents the an entire Web site. At runtime, a query is first executed against both indices, and then the final page score for a given query is produced by combining the scores of the page and its site. Experimental results carried out on a large-scale Web search test collection from a major commercial search engine confirm the proposed approach leads to consistent and significant improvements in retrieval effectiveness.