The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Combining document representations for known-item search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
An analysis on document length retrieval trends in language modeling smoothing
Information Retrieval
Assessing multivariate Bernoulli models for information retrieval
ACM Transactions on Information Systems (TOIS)
Revisiting the relationship between document length and relevance
Proceedings of the 17th ACM conference on Information and knowledge management
Is Wikipedia link structure different?
Proceedings of the Second ACM International Conference on Web Search and Data Mining
The importance of link evidence in Wikipedia
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
The importance of anchor text for ad hoc search revisited
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Combination methods for crosslingual web retrieval
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Using anchor text for homepage and topic distillation search tasks
Journal of the American Society for Information Science and Technology
Focused retrieval using topical language and structure
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
Hi-index | 0.00 |
We investigate language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perform an analysis of prior probability of relevance for a wide range of non-content features, shedding further light on the importance of non-content features for web retrieval. This directly explains the success or failure of various techniques, e.g., why the link topology is particularly helpful to single out important sites. Language models can naturally incorporate multiple document representations, as well as non-content information. For the former, we employ mixture language models based on document full-text, incoming anchor-text, and document titles. For the latter, we study a range of priors based on document length, URL structure, and link topology. We look at three types of topics--distillation, home page, and named page--as well as for a mixed query set. We find that the mixture models lead to considerable improvement of retrieval effectiveness for all topic types. The web-centric priors generally lead to further improvement of retrieval effectiveness.