Implementation and comparison of Serial and Parallel algorithms of SAO in HEVC
Date
2017-12-05Author
Nagathihalli Jagadish, Harsha
0000-0002-5974-6996
Metadata
Show full item recordAbstract
The High Efficiency Video Coding (HEVC) standard is the latest video coding project developed
by the Joint Collaborative Team on Video Coding (JCT-VC) which involves the International
Telecommunication Union (ITU-T) Video Coding Experts Group (VCEG) and the ISO/IEC Moving
Picture Experts Group (MPEG) standardization organizations. HEVC also known as H.265
supports encoding videos with wide range of resolutions, starting from low resolution to
beyond High Definition i.e. 4k or 8k. The HEVC standard is an optimization of the previous
standard H.264/AVC (Advanced Video coding) which is a very well established and widely used
standard in industry and finds its applications in broadcast TV and multimedia telephony.
HEVC was preceded by H.264/AVC with the bit-rate reduction of about 50% at the same visual
quality.
The in-loop filters are an important part of HEVC video coding standard. They attenuate
discontinuities at the prediction and transform boundaries and also improves the quality by
attenuating the ringing artifacts and changes in the sample intensity depending on the
classification algorithm. The main advantage of these filters is it improves the subjective quality
of reconstructed video.
In HEVC, the size of motion predicted blocks varies from 8x4 and 4x8, to 64x64 luma samples,
while the size of block transforms and intra-predicted blocks varies from 4x4 to 32x32 samples.
5
These blocks can be coded independently from the neighboring blocks which allow scope for
parallelism. Various methods have been implemented serially to reduce the computational
complexity of sample adaptive offset. To improve the coding efficiency, an extra step is taken to
implement the code in parallel since the blocks can be coded independent of each other. The
technology is rapidly evolving and moving towards a world of parallelization so as to reduce the
amount of time spent of computation. Multi core and many core based computation and
design are the new trends in the market. As a result, in this thesis an attempt is made to map
the video coding algorithm on the GPU cores to accelerate the speed at which the execution
takes place. This is done using CUDA programming for SAO algorithm. SAO has many stages of
implementation. Each of these stages is implemented in parallel using NVIDIA GPUs. A
comparison of the results obtained in serial and parallel are evaluated using speedup metric
and the subjective quality is measured using PSNR (Peak Signal to Noise Ratio).