Search Engine Coverage of the OAI-PMH Corpus

Authors:
Frank McCown;Xiaoming Liu;Michael L. Nelson;Mohammad Zubair
Affiliations:
Old Dominion University;Los Alamos National Laboratory;Old Dominion University;Old Dominion University
Venue:
IEEE Internet Computing
Year:
2006

Citing 6
Cited 8

Crawler-Friendly Web Servers

ACM SIGMETRICS Performance Evaluation Review
DP9: an OAI gateway service for web crawlers

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
The indexable web is more than 11.5 billion pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Downloading textual hidden web content through keyword queries

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
mod_oai: an apache module for metadata harvesting

ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries

Lazy preservation: reconstructing websites by crawling the crawlers

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Factors affecting website reconstruction from the web infrastructure

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Agreeing to disagree: search engines and their public interfaces

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Recovering a website's server components from the web infrastructure

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Access and Exchange of Hierarchically Structured Resources on the Web with the NESTOR Framework

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Generating a meta-DL by federating search on OAI and non-OAI servers

Journal of Intelligent Information Systems
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
The deep web in institutional repositories in Japan

Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47

Quantified Score

Hi-index	0.00

Visualization

Abstract

Having indexed much of the "surface" Web, search engines are now using various approaches to index the "deep"Web. At the same time, institutional repositories and digital libraries are adopting the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to expose their holdings. The authors harvested nearly 10 million records from OAI-PMH repositories. From these records, they extracted 3.3 million unique resource URLs and then conducted searches on samples from this collection to determine how much of the OAI-PMH corpus the three major search engines have indexed.