Dissemination of collection wide information in a distributed information retrieval system
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
On the update of term weights in dynamic information retrieval systems
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Document filtering with inference networks
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
pSearch: information retrieval in structured overlays
ACM SIGCOMM Computer Communication Review
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
A System for new event detection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
On-Line New Event Detection using Single Pass Clustering TITLE2:
On-Line New Event Detection using Single Pass Clustering TITLE2:
On the design of reliable efficient information systems
On the design of reliable efficient information systems
A study of smoothing methods for language models applied to information retrieval
ACM Transactions on Information Systems (TOIS)
Aggregation of a term vocabulary for P2P-IRtest: a DHT stress test
DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Ranking information resources in peer-to-peer text retrieval: an experimental study
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Aggregation of Document Frequencies in Unstructured P2P Networks
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
An evaluation measure for distributed information retrieval systems
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Relevance weighting using within-document term statistics
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an ''extended stop word list'' - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some ''domain-specific stop words'' need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.