Implementation of Real-Time MPEG-4 FGS Encoder

Authors:
Yen-Kuang Chen;Wen-Hsiao Peng
Affiliations:
-;-
Venue:
PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Year:
2002

Citing 1
Cited 0

Overview of fine granularity scalability in MPEG-4 video standard

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

While computers become faster than they used to be, software implementation of the latest video codec in real time is still a challenging topic. This paper presents our techniques in optimizing the speed of MPEG-4 Fine Granularity Scalability (FGS) video encoders. First, zigzag scans are slow processes in video encoding and decoding. While state-of-the-art processors utilize hardware data prefetchers to reduce memory latency, nonsequential addresses in the zigzag scan may destroy the trackability of hardware prefetching. The problem is even more serious in MPEG-4 FGS where we need multiple scans in bit-plane coding. More than 30% of CPU time is for bit-plane encoding in an MPEG-4 FGS encoder (including base layer and enhancement layer). In this work, we rearrange the layout of the image structure so that zigzag scans are in sequential memory locations. After the rearrangement, there are prefetch reads and we see 80% speed-up in bit-plane encoding. Second, variable length encoder (VLC) incurs a huge number of unpredictable conditional branches. While modern processors can execute tens of instructions in their pipeline, a mis-predicted branch will decrease the efficiency of the pipeline. The problem is severer in MPEG-4 FGS where we need multiple bit-plane VLC's. More than half of the CPU time for MPEG-4 FGS enhancement layer encoder is on bit-plane VLC's. In this work, we also design a bit-plane VLC algorithm, which has fewer unpredictable branches. The new design reduces mis-predicted branches by 2.4x. After these changes, overall speed-up in our MPEG-4 FGS software encoder is 1.4x without any assembly and MMX technology optimization.