Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Applying summarization techniques for term selection in relevance feedback
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using sampled data and regression to merge search engine results
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning long documents for distributed information retrieval
Proceedings of the eleventh international conference on Information and knowledge management
Overview of the second text retrieval conference (TREC-2)
HLT '94 Proceedings of the workshop on Human Language Technology
dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
In environments containing many text search engines a federated search system provides people with a single point of access. When search engines are managed by independent organizations two key problems are discovering and representing the contents of each text database. Query-based sampling is a recent technique for discovering the contents of uncooperative databases so as to create database resource descriptions that support a variety of necessary capabilities. However, when the documents obtained by query-based sampling are very long, as is common in some government environments, disk storage costs can be surprisingly large. This paper investigates methods of pruning sampled documents to reduce storage costs. The experimental results demonstrate that disk storage costs can be reduced by 54-93% while causing only minor losses in federated search accuracy.