Self-similarity in World Wide Web traffic: evidence and possible causes
IEEE/ACM Transactions on Networking (TON)
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
ARIMA time series modeling and forecasting for adaptive I/O prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Web server benchmarking using parallel WAN emulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Aliasing on the world wide web: prevalence and performance implications
Proceedings of the 11th international conference on World Wide Web
Time Series Analysis, Forecasting and Control
Time Series Analysis, Forecasting and Control
Discovery of Web Robot Sessions Based on their Navigational Patterns
Data Mining and Knowledge Discovery
Analysis of Self-Similarity in I/O Workload Using Structural Modeling
MASCOTS '99 Proceedings of the 7th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Analysis and modeling of world wide web traffic
Analysis and modeling of world wide web traffic
Probabilistic User Behavior Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Characterization of a large web site population with implications for content delivery
Proceedings of the 13th international conference on World Wide Web
Synthesizing Representative I/O Workloads for TPC-H
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Ensembles of Models for Automated Diagnosis of System Performance Problems
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
CiteSeerx: an architecture and web service design for an academic document search engine
Proceedings of the 15th international conference on World Wide Web
Primitives for workload summarization and implications for SQL
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine
ACM Transactions on Intelligent Systems and Technology (TIST)
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Hi-index | 0.00 |
Due to the popularity of web applications and their heavy usage, it is important to obtain a good understanding of their workloads in order to improve performance of search services. Existing works have typically focused on generic web workloads without putting emphasis on specific domains. In this paper, we analyze the usage logs of CiteSeer, a scientific literature digital library and search engine, to characterize workloads for both robots and users. Essential ingredients that contribute to workloads are proposed. Among them we find the access intervals show high variance, and thus cannot be predicted well with time-series models. On the other hand, client visiting path and semantics can be well captured with probabilistic models and Zipf-law. Based on the findings, we propose SearchGen, a synthetic workload generator to output traces for scientific literature digital libraries and search engines. A comparison between synthetic workloads and actual logged traces suggests that the synthetic workload fits well.