Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Applying summarization techniques for term selection in relevance feedback
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Generic summaries for indexing in information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The effectiveness of query expansion for distributed information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Using sampled data and regression to merge search engine results
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Content-based retrieval in hybrid peer-to-peer networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Evaluating profiling and query expansion methods for P2P information retrieval
Proceedings of the 2005 ACM workshop on Information retrieval in peer-to-peer networks
Reducing storage costs for federated search of text databases
dg.o '03 Proceedings of the 2003 annual national conference on Digital government research
dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
An evaluation of resource description quality measures
Proceedings of the 2006 ACM symposium on Applied computing
Towards better measures: evaluation of estimated resource description quality for distributed IR
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Search and browse services for heterogeneous collections with the peer-to-peer network Pepper
Information Processing and Management: an International Journal
Using query logs to establish vocabularies in distributed information retrieval
Information Processing and Management: an International Journal
Metadata harvesting for content-based distributed information retrieval
Journal of the American Society for Information Science and Technology
Ranking information resources in peer-to-peer text retrieval: an experimental study
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
Document Compaction for Efficient Query Biased Snippet Generation
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Caching query-biased snippets for efficient retrieval
Proceedings of the 14th International Conference on Extending Database Technology
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
Query-based sampling is a method of discovering the contents of a text database by submitting queries to a search engine and observing the documents returned. In prior research sampled documents were used to build resource descriptions for automatic database selection, and to build a centralized sample database for query expansion and result merging. An unstated assumption was that the associated storage costs were acceptable.When sampled documents are long, storage costs can be large. This paper investigates methods of pruning long documents to reduce storage costs. The experimental results demonstrate that building resource descriptions and centralized sample databases from the pruned contents of sampled documents can reduce storage costs by 54-93% while causing only minor losses in the accuracy of distributed information retrieval.