Index-Based Persistent Document Identifiers

Authors:
Diomidis Spinellis
Affiliations:
Department Management Science and Technology, Athens University of Economics and Business, Patision 76, GR-104 34 Athina, Greece. dds@aueb.gr
Venue:
Information Retrieval
Year:
2005

Citing 22
Cited 0

Optimization of control parameters for genetic algorithms

IEEE Transactions on Systems, Man and Cybernetics
Simulated annealing: theory and applications

Simulated annealing: theory and applications
Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Genetic algorithms for modelling, design, and process control

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Genetic and evolutionary algorithms come of age

Communications of the ACM
Genetic algorithms

ACM Computing Surveys (CSUR)
Citation linking: improving access to online journals

DL '97 Proceedings of the second ACM international conference on Digital libraries
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Information retrieval on the web

ACM Computing Surveys (CSUR)
Electronic document addressing: dealing with change

ACM Computing Surveys (CSUR)
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Analysis of lexical signatures for finding lost or related documents

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Summary of WWW characterizations

World Wide Web
The decay and failures of web references

Communications of the ACM
Digital Libraries and Autonomous Citation Indexing

Computer
Persistence of Web References in Scientific Research

Computer
The Design and Implementation of a Legal Text Database

DEXA '94 Proceedings of the 5th International Conference on Database and Expert Systems Applications
A hierarchical internet object cache

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Searching the Web: general and scientific information access

IEEE Communications Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

The infrastructure of a typical search engine can be used to calculate and resolve persistent document identifiers: a string that can uniquely identify and locate a document on the Internet without reference to its original location (URL). Bookmarking a document using such an identifier allows its retrieval even if the document's URL, and, in many cases, its contents change. Web client applications can offer facilities for users to bookmark a page by reference to a search engine and the persistent identifier instead of the original URL. The identifiers are calculated using a global Internet term index; a document's unique identifier consists of a word or word combination that occurs uniquely in the specific document. We use a genetic algorithm to locate a minimal unique document identifier: the shortest word or word combination that will locate the document. We tested our approach by implementing tools for indexing a document collection, calculating the persistent identifiers, performing queries, and distributing the computation and storage load among many computers.