Exploring URL hit priors for web search

Authors:
Ruihua Song;Guomao Xin;Shuming Shi;Ji-Rong Wen;Wei-Ying Ma
Affiliations:
Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R. China;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R. China;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R. China;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R. China;Microsoft Research Asia, 5F, Sigma Center, Beijing, P.R. China
Venue:
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Year:
2006

Citing 10
Cited 3

Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Word segmentation and recognition for web document framework

Proceedings of the eighth international conference on Information and knowledge management
Modern Information Retrieval

Modern Information Retrieval
The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A taxonomy of web search

ACM SIGIR Forum
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Automatic identification of user goals in Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Title extraction from bodies of HTML documents and its application to web page retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A study of relevance propagation for web search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance weighting for query independent evidence

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

A research on a defending policy against the webcrawler's attack

ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Web scale NLP: a case study on url word breaking

Proceedings of the 20th international conference on World wide web
Progress in information retrieval

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

URL usually contains meaningful information for measuring the relevance of a Web page to a query in Web search. Some existing works utilize URL depth priors (i.e. the probability of being a good page given the length and depth of a URL) for improving some types of Web search tasks. This paper suggests the use of the location of query terms occur in a URL for measuring how well a web page is matched with a user's information need in web search. First, we define and estimate URL hit types, i.e. the priori probability of being a good answer given the type of query term hits in the URL. The main advantage of URL hit priors (over depth priors) is that it can achieve stable improvement for both informational and navigational queries. Second, an obstacle of exploiting such priors is that shortening and concatenation are frequently used in a URL. Our investigation shows that only 30% URL hits are recognized by an ordinary word breaking approach. Thus we combine three methods to improve matching. Finally, the priors are integrated into the probabilistic model for enhancing web document retrieval. Our experiments were conducted using 7 query sets of TREC2002, TREC2003 and TREC2004, and show that the proposed approach is stable and improve retrieval effectiveness by 4%~11% for navigational queries and 10% for informational queries.