Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Inferring probability of relevance using the method of logistic regression
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Database merging strategy based on logistic regression
Information Processing and Management: an International Journal
Evaluation by highly relevant documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval Systems: Theory and Implementation
Information Retrieval Systems: Theory and Implementation
Proceedings of the 27th International Conference on Very Large Data Bases
Information retrieval at Boeing: plans and successes
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
High accuracy retrieval with multiple nested ranker
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Exploring the dark side of the web: collection and analysis of u.s. extremist online forums
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Hi-index | 0.00 |
A large set of Web documents (the TREC GOV2 collection) comes from many separate Internet hosts, such as www.nih.gov and travel.state.gov. There is considerable variability in the number of Web pages (i.e., documents) from each host. In this paper, we present and evaluate a method for setting a maximum number of "hits" that may be presented for each web host. Federated search environments are increasingly common components of digital libraries and in these environments, the benefit of such a maximum is that it can reduce the number of possibly relevant documents presented by each subcollection, without hurting early precision measures such as P@20. Derivation of a maximum number, which is proportional to the subcollection size but not sensitive to different search topics, is made possible by an analysis of patterns of relevance judgment across approximately 17,000 web hosts in GOV2.