Soft timers: efficient microsecond software timer support for network processing
ACM Transactions on Computer Systems (TOCS)
The structure of the “THE”-multiprogramming system
Communications of the ACM
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Effects of clock resolution on the scheduling of interactive and soft real-time processes
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Hardware support for real-time operating systems
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Desktop scheduling: how can we know what the user wants?
NOSSDAV '04 Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
System noise, OS clock ticks, and fine-grained parallel applications
Proceedings of the 19th annual international conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Process prioritization using output production: Scheduling for multimedia
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Fine grained kernel logging with KLogger: experience and insights
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Hi-index | 0.00 |
The overhead of a context switch is typically associated with multitasking, where several applications share a processor. But even if only one runnable application is present in the system and supposedly runs alone, it is still repeatedly preempted in favor of a different thread of execution, namely, the operating system that services periodic clock interrupts. We employ two complementing methodologies to measure the overhead incurred by such events and obtain contradictory results. The first methodology systematically changes the interrupt frequency and measures by how much this prolongs the duration of a program that sorts an array. The over-all overhead is found to be 0.5-1.5% at 1000 Hz, linearly proportional to the tick rate, and steadily declining as the speed of processors increases. If the kernel is configured such that each tick is slowed down by an access to an external time source, then the direct overhead dominates. Otherwise, the relative weight of the indirect portion is steadily growing with processors' speed, accounting for up to 85% of the total. The second methodology repeatedly executes a simplistic loop (calibrated to take 1ms), measures the actual execution time, and analyzes the perturbations. Some loop implementations yield results similar to the above, but others indicate that the overhead is actually an order of magnitude bigger, or worse. The phenomenon was observed on IA32, IA64, and Power processors, the latter being part of the ASC Purple supercomputer. Indeed, the effect is dramatically amplified for parallel jobs, where one late thread holds up all its peers, causing a slowdown that is dominated by the per-node latency (numerator) and the job granularity (denominator). We show the effect is due to an unexplained interrupt/loop interaction; the question of whether this hardware misfeature is experienced by real applications remains open.