Recursive design of hardware priority queues

Authors:
Yehuda Afek;Anat Bremler-Barr;Liron Schiff
Affiliations:
Tel Aviv University, Tel Aviv, Israel;The Interdisciplinary Center, Hertzelia, Israel;Tel Aviv University, Tel Aviv, Israel
Venue:
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2013

Citing 16
Cited 0

VirtualClock: a new traffic control algorithm for packet-switched networks

ACM Transactions on Computer Systems (TOCS)
Efficient fair queueing using deficit round-robin

IEEE/ACM Transactions on Networking (TON)
When can we sort in o(nlogn) time?

Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
An engineering approach to computer networking: ATM networks, the Internet, and the telephone network

An engineering approach to computer networking: ATM networks, the Internet, and the telephone network
Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks

IEEE/ACM Transactions on Networking (TON)
A parallel priority queue with constant time operations

Journal of Parallel and Distributed Computing - Parallel and distributed data structures
Sorting and Searching using Ternary CAMs

IEEE Micro
Equivalence between Priority Queues and Sorting

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Scalable Hardware Priority Queue Architectures for High-Speed Packet Switches

RTAS '97 Proceedings of the 3rd IEEE Real-Time Technology and Applications Symposium (RTAS '97)
Architecture independent parallel selection with applications to parallel priority queues

Theoretical Computer Science
Evaluating the number of active flows in a scheduler realizing fair statistical bandwidth sharing

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
High Performance Switches and Routers

High Performance Switches and Routers
Pipelined heap (priority queue) management for advanced scheduling in high-speed networks

IEEE/ACM Transactions on Networking (TON)
Ternary CAM power and delay model: extensions and uses

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A scalable packet sorting circuit for high-speed WFQ packet scheduling

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fast regular expression matching using small TCAMs for network intrusion detection and prevention systems

USENIX Security'10 Proceedings of the 19th USENIX conference on Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k-1 primitive priority queues of size k√n (k=2,3,...) and a RAM of size n, where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues. Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays, shift-registers and similar technologies. As a by product of our technique we present an O(n) time sorting algorithm in a system equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.