ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Systems performance measurement on PCI Pamette
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
TradeOffs in Supporting Two Page Sizes
TradeOffs in Supporting Two Page Sizes
Practical, transparent operating system support for superpages
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Protection strategies for direct access to virtualized I/O devices
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
On the DMA mapping problem in direct device assignment
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
vIOMMU: efficient IOMMU emulation
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
DaaC: device-reserved memory as an eviction-based file cache
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
NetVM: high performance and flexible networking using virtualization on commodity platforms
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
The input/output memory management unit (IOMMU) was recently introduced into mainstream computer architecture when both Intel and AMD added IOMMUs to their chip-sets. An IOMMU provides memory protection from I/O devices by enabling system software to control which areas of physical memory an I/O device may access. However, this protection incurs additional direct memory access (DMA) overhead due to the required address resolution and validation. IOMMUs include an input/output translation lookaside buffer (IOTLB) to speed-up address resolution, but still every IOTLB cache-miss causes a substantial increase in DMA latency and performance degradation of DMA-intensive workloads. In this paper we first demonstrate the potential negative impact of IOTLB cache-misses on workload performance. We then propose both system software and hardware enhancements to reduce IOTLB miss rate and accelerate address resolution. These enhancements can lead to a reduction of over 60% in IOTLB miss-rate for common I/O intensive workloads.