Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Initial Observations of the Simultaneous Multithreading Pentium 4 Processor
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Enterprise IT Trends and Implications for Architecture Research
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A performance counter architecture for computing accurate CPI components
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Power provisioning for a warehouse-sized computer
Proceedings of the 34th annual international symposium on Computer architecture
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Cloud9: a software testing service
ACM SIGOPS Operating Systems Review
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Benchmarking cloud serving systems with YCSB
Proceedings of the 1st ACM symposium on Cloud computing
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
The impact of management operations on the virtualized datacenter
Proceedings of the 37th annual international symposium on Computer architecture
CloudCmp: shopping for a cloud made easy
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
CloudCmp: comparing public cloud providers
IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
The impact of memory subsystem resource sharing on datacenter applications
Proceedings of the 38th annual international symposium on Computer architecture
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Toward Dark Silicon in Servers
IEEE Micro
Reducing OLTP instruction misses with thread migration
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Proceedings of the 39th Annual International Symposium on Computer Architecture
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Traffic management: a holistic approach to memory placement on NUMA systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
From A to E: analyzing TPC's OLTP benchmarks: the obsolete, the ubiquitous, the unexplored
Proceedings of the 16th International Conference on Extending Database Technology
KnightShift: Scaling the Energy Proportionality Wall through Server-Level Heterogeneity
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Rethinking DRAM Power Modes for Energy Proportionality
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
NOC-Out: Microarchitecting a Scale-Out Processor
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Towards energy-proportional computing for enterprise-class server workloads
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
OLTP in wonderland: where do cache misses come from in major OLTP components?
Proceedings of the Ninth International Workshop on Data Management on New Hardware
From embedded multi-core SoCs to scale-out processors
Proceedings of the Conference on Design, Automation and Test in Europe
Correlation-aware virtual machine allocation for energy-efficient datacenters
Proceedings of the Conference on Design, Automation and Test in Europe
Efficient virtual memory for big memory servers
Proceedings of the 40th Annual International Symposium on Computer Architecture
STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution
Proceedings of the 40th Annual International Symposium on Computer Architecture
Proceedings of the 40th Annual International Symposium on Computer Architecture
Proceedings of the 40th Annual International Symposium on Computer Architecture
Virtualizing power distribution in datacenters
Proceedings of the 40th Annual International Symposium on Computer Architecture
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
RISO: relaxed network-on-chip isolation for cloud processors
Proceedings of the 50th Annual Design Automation Conference
Guide-copy: fast and silent migration of virtual machine for datacenters
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On evaluating commercial Cloud services: A systematic review
Journal of Systems and Software
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Coloring the cloud for predictable performance
Proceedings of the 4th annual Symposium on Cloud Computing
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
RDIP: return-address-stack directed instruction prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SHIFT: shared history instruction fetch for lean-core server processors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Market mechanisms for managing datacenters with heterogeneous microarchitectures
ACM Transactions on Computer Systems (TOCS)
Ubik: efficient cache sharing with strict qos for latency-critical workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Integrated 3D-stacked server designs for increasing physical density of key-value stores
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Rhythm: harnessing data parallel hardware for server workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
Apple-CORE: Harnessing general-purpose many-cores with hardware concurrency management
Microprocessors & Microsystems
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.00 |
Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads. In this work, we introduce CloudSuite, a benchmark suite of emerging scale-out workloads. We use performance counters on modern servers to study scale-out workloads, finding that today's predominant processor micro-architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture. Moreover, while today's predominant micro-architecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers.