Using name-based mappings to increase hit rates
IEEE/ACM Transactions on Networking (TON)
Predicting MPEG execution times
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An active service framework and its application to real-time multimedia transcoding
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
Increasing power efficiency of multi-core network processors through data filtering
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
A Cluster-Based Active Router Architecture
IEEE Micro
A Cluster-Based Active Router Architecture Supporting Video/Audio Stream Transcoding Service
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Compression for Great Digital Video: Power Tips, Techniques, and Common Sense
Compression for Great Digital Video: Power Tips, Techniques, and Common Sense
A scalable load balancer for forwarding internet traffic: exploiting flow-level burstiness
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Load Balancing in a Cluster-Based Web Server for Multimedia Applications
IEEE Transactions on Parallel and Distributed Systems
An evaluation of network stack parallelization strategies in modern operating systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Towards high-performance flow-level packet processing on multi-core network processors
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Energy-Aware Scheduling for Streaming Applications on Chip Multiprocessors
RTSS '07 Proceedings of the 28th IEEE International Real-Time Systems Symposium
Adaptive load sharing for network processors
IEEE/ACM Transactions on Networking (TON)
Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
ECRTS '08 Proceedings of the 2008 Euromicro Conference on Real-Time Systems
A scalable multithreaded L7-filter design for multi-core servers
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Evaluating the Performance of Network Protocol Processing on Multi-core Systems
AINA '09 Proceedings of the 2009 International Conference on Advanced Information Networking and Applications
A programmable overlay router for service provider innovation
Proceedings of the 2nd ACM SIGCOMM workshop on Programmable routers for extensible services of tomorrow
RouteBricks: exploiting parallelism to scale software routers
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Optimizing throughput and latency under given power budget for network packet processing
INFOCOM'10 Proceedings of the 29th conference on Information communications
Exploiting heterogeneous multicore-processor systems for high-performance network processing
IBM Journal of Research and Development
Packet scheduling for deep packet inspection on multi-core architectures
Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
The JOURNEY active network model
IEEE Journal on Selected Areas in Communications
Hash routing for collections of shared Web caches
IEEE Network: The Magazine of Global Internetworking
A regular expression matching engine with hybrid memories
Computer Standards & Interfaces
Hi-index | 0.00 |
We study a streaming network application -- video transcoding to be executed on a multi-core server. It is important for the scheduler to minimize the total processing time and preserve good video quality in an energy-efficient manner. However, the performance of existing scheduling schemes is largely limited by ineffective use of the multi-core architecture characteristic and undifferentiated transcoding cost in terms of energy consumption. In this paper, we identify three key factors that collectively play important roles in affecting transcoding performance: memory access (M), core/cache topology (C) and transcoding format cost (C), or MC^2 for short. Based on MC^2, we propose E-AHRW, an Energy-efficient Adaptive Highest Random Weight hash scheduler by extending the HRW scheduler proposed for packet scheduling on a homogeneous multiprocessor. E-AHRW achieves stream locality and load balancing at both stream and packet (frame) level by adaptively adjusting the hashing decision according to real-time weighted queue length of each processing unit (PU). Based on E-AHRW, we also design, implement and evaluate a hash-tree scheduler to further reduce the computation cost and achieve more effective load balancing on multi-core architectures. Through implementation on an Intel Xeon server and evaluations on realistic workload, we demonstrate that E-AHRW improves throughput, energy efficiency and video quality due to better load balancing, lower L2 cache miss rate and negligible scheduling overhead.