Query-independent evidence in home page finding

Authors:
Trystan Upstill;Nick Craswell;David Hawking
Affiliations:
Australian National University, Canberra, Australia;CSIRO Mathematical and Information Sciences, Canberra, Australia;CSIRO Mathematical and Information Sciences, Canberra, Australia
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2003

Citing 10
Cited 19

WebQuery: searching and visualizing the Web through connectivity

Selected papers from the sixth international conference on World Wide Web
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Does “authority” mean quality? predicting expert quality ratings of Web documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
When experts agree: using non-affiliated experts to rank popular topics

Proceedings of the 10th international conference on World Wide Web
A case study in web search using TREC algorithms

Proceedings of the 10th international conference on World Wide Web
Effective site finding using link anchor information

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A taxonomy of web search

ACM SIGIR Forum
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
Link analysis, eigenvectors and stability

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Exploiting hyperlink recommendation evidence in navigational web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Challenges in enterprise search

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
How valuable is external link evidence when searching enterprise Webs?

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A decision mechanism for the selective combination of evidence in topic distillation

Information Retrieval
Beyond PageRank: machine learning for static ranking

Proceedings of the 15th international conference on World Wide Web
Navigating the intranet with high precision

Proceedings of the 16th international conference on World Wide Web
An evolutionary approach for combining different sources of evidence in search engines

Information Systems
Probabilistic static pruning of inverted files

ACM Transactions on Information Systems (TOIS)
Probabilistic document length priors for language models

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Predicting Web Page Status

Information Systems Research
Discriminative graphical models for faculty homepage discovery

Information Retrieval
Topic Distillation with Query-Dependent Link Connections and Page Characteristics

ACM Transactions on the Web (TWEB)
On identifying academic homepages for digital libraries

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Topic-independent web high-quality page selection based on k-means clustering

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A web-based method for building company name knowledge base

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Effective topic distillation with key resource pre-selection

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Using anchor text for homepage and topic distillation search tasks

Journal of the American Society for Information Science and Technology
Criteria of query-independent page significance in geospatial web search

Proceedings of the 7th Workshop on Geographic Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hyperlink recommendation evidence, that is, evidence based on the structure of a web's link graph, is widely exploited by commercial Web search systems. However there is little published work to support its popularity. Another form of query-independent evidence, URL-type, has been shown to be beneficial on a home page finding task. We compared the usefulness of these types of evidence on the home page finding task, combined with both content and anchor text baselines. Our experiments made use of five query sets spanning three corpora---one enterprise crawl, and the WT10g and VLC2 Web test collections.We found that, in optimal conditions, all of the query-independent methods studied (in-degree, URL-type, and two variants of PageRank) offered a better than random improvement on a content-only baseline. However, only URL-type offered a better than random improvement on an anchor text baseline. In realistic settings, for either baseline, only URL-type offered consistent gains. In combination with URL-type the anchor text baseline was more useful for finding popular home pages, but URL-type with content was more useful for finding randomly selected home pages. We conclude that a general home page finding system should combine evidence from document content, anchor text, and URL-type classification.