SeerSuite: developing a scalable and reliable application framework for building digital libraries by crawling the web

  • Authors:
  • Pradeep B. Teregowda;Isaac G. Councill;R. Juan Pablo Fernández;Madian Khabsa;Shuyi Zheng;C. Lee Giles

  • Affiliations:
  • Pennsylvania State University;Google;Pennsylvania State University;Pennsylvania State University;Pennsylvania State University;Pennsylvania State University

  • Venue:
  • WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

SeerSuite is a framework for scientific and academic digital libraries and search engines built by crawling scientific and academic documents from the web with a focus on providing reliable, robust services. In addition to full text indexing, SeerSuite supports autonomous citation indexing and automatically links references in research articles to facilitate navigation, analysis and evaluation. SeerSuite enables access to extensive document, citation, and author metadata by automatically extracting, storing and indexing metadata. SeerSuite also supports MyCiteSeer, a personal portal that allows users to monitor documents, store user queries, build document portfolios, and interact with the document metadata. We describe the design of SeerSuite and the deployment and usage of CiteSeerx as an instance of SeerSuite.