The Harvest information discovery and access system
Computer Networks and ISDN Systems
STARTS: Stanford proposal for Internet meta-searching
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
The open archives initiative: building a low-barrier interoperability framework
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Preservation and transition of NCSTRL using an OAI-based architecture
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Core services in the architecture of the national science digital library (NSDL)
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
DP9: an OAI gateway service for web crawlers
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Pruning long documents for distributed information retrieval
Proceedings of the eleventh international conference on Information and knowledge management
Comparing Hybrid Peer-to-Peer Systems
Proceedings of the 27th International Conference on Very Large Data Bases
Designing Protocols in Support of Digital Library Componentization
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Resource selection and data fusion in multimedia distributed digital libraries
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Content-based retrieval in hybrid peer-to-peer networks
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Distributed Multimedia Information Retrieval: Sigir 2003 Workshop on Distributed Information Retrieval, Toronto, Canada, August 2003: Revised, Selected, and Invited Papers (Lecture Notes in Computer Science, 2924)
Full-text federated search of text-based digital libraries in peer-to-peer networks
Information Retrieval
Pathways: augmenting interoperability across scholarly repositories
International Journal on Digital Libraries
Generative model-based metasearch for data fusion in information retrieval
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative's (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval. As in crawling, some data move toward the retrieval process, but it is statistics about the content rather than content itself; this grants more efficient use of network resources and wider scope of application. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval; this reduces the costs of content provision while promoting the simplicity, effectiveness, and responsiveness of retrieval. Overall, we argue that the approach retains the good properties of centralized retrieval without renouncing to cost-effective, large-scale resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure. In particular, we define a minimal extension of the OAI protocol which supports the coordinated harvesting of full-content indices and descriptive metadata for content resources. Finally, we report on the implementation of a proof-of-concept prototype service for multimodel content-based retrieval of distributed file collections. © 2008 Wiley Periodicals, Inc.