PaSE: locating online copy of scientific documents effectively

  • Authors:
  • Byung-Won On;Dongwon Lee

  • Affiliations:
  • Department of Computer Science and Engineering, The Pennsylvania State University, PA;School of Information and Sciences and Technology, The Pennsylvania State University, PA

  • Venue:
  • ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need for fast and vast dissemination of research results has led a new trend such that more number of authors post their documents to personal or group Web spaces so that others can easily access and download them. Similarly, more and more researchers use online search for accessing documents of interest in Web, instead of paying a visit to libraries. Currently, to locate and download an online copy of a particular document D, one typically (1) uses Search Engines with the citation information and browses through returned web pages (e.g., author's homepage) to see if any contains D, or (2) uses searching facilities of an individual Digital Library (e.g., CiteSeer, e-Print) looking for D, and if not found, repeats the search in another Digital Library. However, the scheme (1) involves human browsing to get to the final online copy, while the scheme (2) suffers from incomplete coverage. To remedy these shortcomings, in this paper, we present a system, named as PaSE, which can effectively locate online copies (e.g., PDF or PS) of scientific documents using citation information. We consider a myriad of alternatives in crawling and parsing the Web to arrive at the right document quickly, and present a preliminary experimental study. Using some of the best alternatives that we have identified, we show that PaSE can locate online copy of documents more accurately and conveniently than human users would do at the cost of elongated search time.