SeerSuite: developing a scalable and reliable application framework for building digital libraries by crawling the web

Authors:
Pradeep B. Teregowda;Isaac G. Councill;R. Juan Pablo Fernández;Madian Khabsa;Shuyi Zheng;C. Lee Giles
Affiliations:
Pennsylvania State University;Google;Pennsylvania State University;Pennsylvania State University;Pennsylvania State University;Pennsylvania State University
Venue:
WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
Year:
2010

Citing 17
Cited 7

The blackboard model of problem solving

AI Magazine
CiteSeer: an automatic citation indexing system

Proceedings of the third ACM conference on Digital libraries
Distributed error correction

Proceedings of the fourth ACM conference on Digital libraries
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
eBizSearch: an OAI-compliant digital library for eBusiness

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Architectural styles and the design of network-based software architectures

Architectural styles and the design of network-based software architectures
CiteSeer-API: towards seamless resource location and interlinking for digital libraries

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Rule-based word clustering for document metadata extraction

Proceedings of the 2005 ACM symposium on Applied computing
A framework for distributed digital object services

International Journal on Digital Libraries
CiteSeerx: an architecture and web service design for an academic document search engine

Proceedings of the 15th international conference on World Wide Web
Learning metadata from the evidence in an on-line citation matching scheme

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
TableSeer: automatic table metadata extraction and searching in digital libraries

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
ChemXSeer: a digital library and data repository for chemical kinetics

Proceedings of the ACM first workshop on CyberInfrastructure: information management in eScience
Metadata extraction and indexing for map search in web documents

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient name disambiguation for large-scale databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Towards next generation citeseer: a flexible architecture for digital library deployment

ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries

oreChem ChemXSeer: a semantic digital library for chemistry

Proceedings of the 10th annual joint conference on Digital libraries
CiteSeerx: a cloud perspective

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
A system for indexing tables, algorithms and figures

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Integrating bibliographical data of computer science publications from online digital libraries

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part III
Web crawler middleware for search engine digital libraries: a case study for citeseerX

Proceedings of the twelfth international workshop on Web information and data management
Extracting and matching authors and affiliations in scholarly documents

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
A survey of faceted search

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

SeerSuite is a framework for scientific and academic digital libraries and search engines built by crawling scientific and academic documents from the web with a focus on providing reliable, robust services. In addition to full text indexing, SeerSuite supports autonomous citation indexing and automatically links references in research articles to facilitate navigation, analysis and evaluation. SeerSuite enables access to extensive document, citation, and author metadata by automatically extracting, storing and indexing metadata. SeerSuite also supports MyCiteSeer, a personal portal that allows users to monitor documents, store user queries, build document portfolios, and interact with the document metadata. We describe the design of SeerSuite and the deployment and usage of CiteSeerx as an instance of SeerSuite.