Hand-Over-Face Segmentation
Abstract
Accurate hand segmentation is vital in many applications in which the hands play a central role, such as sign language recognition, action recognition, and gesture recognition. A relatively unexplored obstacle to correct hand segmentation is when the hand overlaps the face. The shortage of a dataset for this research area has been one motivation for this work. However, this dissertation investigates and proposes improvements for the hand-over-face segmentation task.
Toward an in-depth study of the hand segmentation problem, the work presented in this dissertation will yield several contributions. First, it introduces a survey on sign language recognition systems using mobile phones, which shows a recent practical example of the need for the hand segmentation dataset and comprehensive research work. Second, following the context of this work, a literature review that covers and summarizes all available hand segmentation datasets will be presented. Besides, I provide a public dataset (VLM-HandOverFace) for hand segmentation task. This newly constructed dataset contains 4384 labeled frames and includes color, depth, infrared streams recorded by Kinect. The performance of the VLM-HandOverFace dataset is evaluated using several state-of-the-art architectures. Furthermore, this dissertation proposes the Multi-level Pyramid Scene Parsing Network (MPSP-Net) for semantic segmentation. I also provide a thorough discussion and evaluations of the new modeled-solution about the unique characteristics that demonstrate its applicability for the hand-over-face segmentation challenge.
Several experiments were conducted to examine MPSPNet using two object segmentation datasets and two hand segmentation datasets. The results show that the proposed method achieves at least a 6% improvement in mIOU compared with all state-of-the-art methods. Finally, various experiments conducted to measure the impact of including temporal motion information on MPSPNet.