Extending SDARTS: extracting metadata from web databases and interfacing with the open archives initiative

  • Authors:
  • Panagiotis G. Ipeirotis;Tom Barry;Luis Gravano

  • Affiliations:
  • Columbia University;Columbia University;Columbia University

  • Venue:
  • Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines two complementary existing protocols, SDLIP and STARTS, to define a uniform interface that collections should support for searching and exporting metasearch-related metadata. SDARTS also includes a toolkit with wrappers that are easily customized to make both local and remote document collections SDARTS-compliant. This paper describes two significant ways in which we have extended the SDARTS toolkit. First, we have added a tool that automatically builds rich content summaries for remote web collections bym probing the collections with appropriate queries. These content summaries can then be used by a metasearcher to select over which collections to evaluate a given query. Second, we have enhanced the SDARTS toolkit so that all SDARTS-compliant collections export their metadata under the emerging Open Archives Initiative (OAI) protocol. Conversely, the SDARTS toolkit now also allows all OAI-compliant collections to be made SDARTS-compliant with minimal effort. As a result, we implemented a bridge between SDARTS and OAI, which will facilitate easy interoperability among a potentially large number of collections. The SDARTS toolkit, with all related documentation and source code, is publicly available at http://sdarts.cs.columbia.edu.