Communications of the ACM
Experimental evaluation in computer science: a quantitative study
Journal of Systems and Software
Dhrystone: a synthetic systems programming benchmark
Communications of the ACM
ACM president's letter: performance analysis: experimental computer science as its best
Communications of the ACM
ACM President's Letter: What is experimental computer science?
Communications of the ACM
Condor: a distributed job scheduler
Beowulf cluster computing with Linux
Communications of the ACM - Transforming China
The design principles of PlanetLab
ACM SIGOPS Operating Systems Review
PlanetFlow: maintaining accountability for network services
ACM SIGOPS Operating Systems Review
Statistically rigorous java performance evaluation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Experience-driven experimental systems research
Communications of the ACM
Experiences building PlanetLab
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Everlab: a production platform for research in network experimentation and computation
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Producing wrong data without doing anything obviously wrong!
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Handles revisited: optimising performance and memory costs in a real-time collector
Proceedings of the international symposium on Memory management
Repeatability, reproducibility, and rigor in systems research
EMSOFT '11 Proceedings of the ninth ACM international conference on Embedded software
Precise regression benchmarking with random effects: improving mono benchmark results
EPEW'06 Proceedings of the Third European conference on Formal Methods and Stochastic Models for Performance Evaluation
Our troubles with Linux and why you should care
Proceedings of the Second Asia-Pacific Workshop on Systems
Hi-index | 0.00 |
Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with the given time pressure of producing results. So how can one trust any reported empirical analysis of a new idea or concept in computer science? This paper introduces DataMill, a community-based easy-to-use services-oriented open benchmarking infrastructure for performance evaluation. DataMill facilitates producing robust, reliable, and reproducible results. The infrastructure incorporates the latest results on hidden factors and automates the variation of these factors. Multiple research groups already participate in DataMill. DataMill is also of interest for research on performance evaluation. The infrastructure supports quantifying the effect of hidden factors, disseminating the research results beyond mere reporting. It provides a platform for investigating interactions and composition of hidden factors.