ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
The case for hardware transactional memory in software packet processing
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Hi-index | 0.00 |
Network processors are being asked to perform increasingly complex operations on packets of information at faster and faster rates. Because processor performance and memory cycle times are not keeping up with this demand, there is a fundamental need for simultaneous processing of multiple packets, and the degree of this parallelism is increasing. Sometimes a dependency exists between two packets currently being operated on, and as the ratio of packet processing time to packet transmission time increases, these dependencies are more likely to impact performance. Thus, the way packet dependencies are handled will become critical. In this paper we show that there is potentially a dramatic difference in performance between optimal and non-optimal solutions. We argue that this is the key challenge that must be addressed in highly parallel network processors. We discuss how work in thread level speculation relates to this problem and describe a practical hardware implementation that requires little or no changes to software and with near optimal performance.