ACM SIGMETRICS Performance Evaluation Review
DP9: an OAI gateway service for web crawlers
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Proceedings of the 27th International Conference on Very Large Data Bases
The indexable web is more than 11.5 billion pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Downloading textual hidden web content through keyword queries
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
mod_oai: an apache module for metadata harvesting
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Factors affecting website reconstruction from the web infrastructure
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Agreeing to disagree: search engines and their public interfaces
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Recovering a website's server components from the web infrastructure
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Access and Exchange of Hierarchically Structured Resources on the Web with the NESTOR Framework
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Generating a meta-DL by federating search on OAI and non-OAI servers
Journal of Intelligent Information Systems
Information Sciences: an International Journal
The deep web in institutional repositories in Japan
Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Hi-index | 0.00 |
Having indexed much of the "surface" Web, search engines are now using various approaches to index the "deep"Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings. The authors harvested nearly 10 million records from OAI-PMH repositories. From these records, they extracted 3.3 million unique resource URLs and then conducted searches on samples from this collection to determine how much of the OAI-PMH corpus the three major search engines have indexed.