Index-Based Persistent Document Identifiers

  • Authors:
  • Diomidis Spinellis

  • Affiliations:
  • Department Management Science and Technology, Athens University of Economics and Business, Patision 76, GR-104 34 Athina, Greece. dds@aueb.gr

  • Venue:
  • Information Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The infrastructure of a typical search engine can be used to calculate and resolve persistent document identifiers: a string that can uniquely identify and locate a document on the Internet without reference to its original location (URL). Bookmarking a document using such an identifier allows its retrieval even if the document's URL, and, in many cases, its contents change. Web client applications can offer facilities for users to bookmark a page by reference to a search engine and the persistent identifier instead of the original URL. The identifiers are calculated using a global Internet term index; a document's unique identifier consists of a word or word combination that occurs uniquely in the specific document. We use a genetic algorithm to locate a minimal unique document identifier: the shortest word or word combination that will locate the document. We tested our approach by implementing tools for indexing a document collection, calculating the persistent identifiers, performing queries, and distributing the computation and storage load among many computers.