Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
A status report on research in transparent informed prefetching
ACM SIGOPS Operating Systems Review
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Kitrace: precise interactive measurement of operating systems kernels
Software—Practice & Experience
The Harvest information discovery and access system
Computer Networks and ISDN Systems
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
Fine-grained dynamic instrumentation of commodity operating system kernels
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
A comparison of Windows driver model latency performance on Windows NT and Windows 98
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
IO-Lite: a unified I/O buffering and caching system
ACM Transactions on Computer Systems (TOCS)
Developing flexible and high-performance Web servers with frameworks and patterns
ACM Computing Surveys (CSUR)
Information and control in gray-box systems
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
SEDA: an architecture for well-conditioned, scalable internet services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Exploiting Gray-Box Knowledge of Buffer-Cache Management
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Cooperative Task Management Without Manual Stack Management
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Using Cohort-Scheduling to Enhance Server Performance
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Simple and General Statistical Profiling with PCT
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Kqueue - A Generic and Scalable Event Notification Facility
Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
The Design and Implementation of the Intel® Real-Time Performance Analyzer
RTAS '02 Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'02)
Transforming policies into mechanisms with infokernel
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Defensive programming: using an annotation toolkit to build DoS-resistant software
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Measuring and characterizing system behavior using kernel-level event logging
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Scalable kernel performance for internet servers under realistic loads
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Flash: an efficient and portable web server
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
A scalable and explicit event delivery mechanism for UNIX
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Evaluating the impact of simultaneous multithreading on network servers using real hardware
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Sympathy for the sensor network debugger
Proceedings of the 3rd international conference on Embedded networked sensor systems
Towards a debugging system for sensor networks
International Journal of Network Management
Versatile, portable, and efficient OS profiling via latency analysis
Proceedings of the twentieth ACM symposium on Operating systems principles
Server network scalability and TCP offload
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
A portable kernel abstraction for low-overhead ephemeral mapping management
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Making events less slippery with eel
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
FoxyTechnique: tricking operating system policies with a virtual machine monitor
Proceedings of the 3rd international conference on Virtual execution environments
Operating system profiling via latency analysis
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Integrated parallel performance views
Cluster Computing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Application controlled caching for web servers
Enterprise Information Systems
Analyzing blocking to debug performance problems on multi-core systems
ACM SIGOPS Operating Systems Review
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Early experiences with KTAU on the IBM BG/L
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Hi-index | 0.00 |
For operating system intensive applications, the ability of designers to understand system call performance behavior is essential to achieving high performance. Conventional performance tools, such as monitoring tools and profilers, collect and present their information off-line or via out-of-band channels. We believe that making this information first-class and exposing it to applications via in-band channels on a per-call basis presents opportunities for performance analysis and tuning not available via other mechanisms. Furthermore, our approach provides direct feedback to applications on time spent in the kernel, resource contention, and time spent blocked, allowing them to immediately observe how their actions affect kernel behavior. Not only does this approach provide greater transparency into the workings of the kernel, but it also allows applications to control how performance information is collected, filtered, and correlated with application-level events. To demonstrate the power of this approach, we show that our implementation, DeBox, obtains precise information about OS behavior at low cost, and that it can be used in debugging and tuning application performance on complex workloads. In particular, we focus on the industry-standard SpecWeb99 benchmark running on the Flash Web Server. Using DeBox, we are able to diagnose a series of problematic interactions between the server and the OS. Addressing these issues as well as other optimization opportunities generates an overall factor of four improvement in our SpecWeb99 score, throughput gains on other benchmarks, and latency reductions ranging from a factor of 4 to 47.