Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Experience with processes and monitors in Mesa
Communications of the ACM
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Multiple Reservations and the Oklahoma Update
IEEE Parallel & Distributed Technology: Systems & Technology
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Kicking the tires of software transactional memory: why the going gets tough
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Early experience with a commercial hardware transactional memory implementation
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
QuakeTM: parallelizing a complex sequential application using transactional memory
Proceedings of the 23rd international conference on Supercomputing
Simplifying concurrent algorithms by exploiting hardware transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
RMS-TM: a comprehensive benchmark suite for transactional memory systems
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
DISC'06 Proceedings of the 20th international conference on Distributed Computing
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
High Performance Non-uniform FFT on Modern X86-based Multi-core Systems
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Evaluation of Blue Gene/Q hardware support for transactional memories
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
What scientific applications can benefit from hardware transactional memory?
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Transactional Memory Architecture and Implementation for IBM System Z
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
SI-TM: reducing transactional memory abort rates through snapshot isolation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Scaling existing lock-based applications with lock elision
Communications of the ACM
Scaling Existing Lock-based Applications with Lock Elision
Queue - Performance
Hi-index | 0.02 |
Intel has recently introduced Intel® Transactional Synchronization Extensions (Intel® TSX) in the Intel 4th Generation Core™ Processors. With Intel TSX, a processor can dynamically determine whether threads need to serialize through lock-protected critical sections. In this paper, we evaluate the first hardware implementation of Intel TSX using a set of high-performance computing (HPC) workloads, and demonstrate that applying Intel TSX to these workloads can provide significant performance improvements. On a set of real-world HPC workloads, applying Intel TSX provides an average speedup of 1.41x. When applied to a parallel user-level TCP/IP stack, Intel TSX provides 1.31x average bandwidth improvement on network intensive applications. We also demonstrate the ease with which we were able to apply Intel TSX to the various workloads.