Workload analysis for scientific literature digital libraries

  • Authors:
  • Huajing Li;Wang-Chien Lee;Anand Sivasubramaniam;C. Lee Giles

  • Affiliations:
  • The Pennsylvania State University, Department of Computer Science and Engineering, 16802, University Park, PA, USA;The Pennsylvania State University, Department of Computer Science and Engineering, 16802, University Park, PA, USA;The Pennsylvania State University, Department of Computer Science and Engineering, 16802, University Park, PA, USA;The Pennsylvania State University, Department of Computer Science and Engineering, College of Information Sciences and Technology, 16802, University Park, PA, USA

  • Venue:
  • International Journal on Digital Libraries - Special Issue on Very Large Digital Libraries
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Workload studies of large-scale systems may help locating possible bottlenecks and improving performances. However, previous workload analysis for Web applications is typically focused on generic platforms, neglecting the unique characteristics exhibited in various domains of these applications. It is observed that different application domains have intrinsically heterogeneous characteristics, which have a direct impact on the system performance. In this study, we present an extensive analysis into the workload of scientific literature digital libraries, unveiling their temporal and user interest patterns. Logs of a computer science literature digital library, CiteSeer, are collected and analyzed. We intentionally remove service details specific to CiteSeer. We believe our analysis is applicable to other systems with similar characteristics. While many of our findings are consistent with previous Web analysis, we discover several unique characteristics of scientific literature digital library workload. Furthermore, we discuss how to utilize our findings to improve system performance.