The case for hardware transactional memory in software packet processing
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Transactional memories for multi-processor FPGA platforms
Journal of Systems Architecture: the EUROMICRO Journal
NetTM: faster and easier synchronization for soft multicores via transactional memory
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Emulating transactional memory on FPGA multiprocessors
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
From plasma to beefarm: design experience of an FPGA-based multicore prototype
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Application-specific signatures for transactional memory in soft processors
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Application-specific signatures for transactional memory in soft processors
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
What scientific applications can benefit from hardware transactional memory?
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Resource-bounded multicore emulation using Beefarm
Microprocessors & Microsystems
Hi-index | 0.00 |
Programming efficiency of heterogeneous concurrent systems is limited by the use of lock-based synchronization mechanisms. Transactional memories can greatly improve the programming efficiency of such systems. In field-programmable computing machines, a conventional fixed transactional memory becomes inefficient use of the silicon. We propose configurable transactional memory (CTM) as a mechanism to implement application specific synchronization that utilizes the field-programmability of such devices to match with the requirements of an application. The proposed configurable transactional memory is targeted at embedded applications and is area efficient compared to conventional schemes that are implemented with cache-coherent protocols. In particular, the CTM is designed to be incorporated in to compilation and synthesis paths of either high-level languages or during system creation process using tools such as Xilinx EDK. We study the impact of deploying a CTM in a packet metering and statistics application and two micro-benchmarks as compared to a lock-based synchronization scheme. We have implemented this application in a Xilinx Virtex4 device and found that the CTM was 0-73% better than a fine-grained lock-based scheme.