Video-based Face Recognition using Deep Learning for Single Sample Per Person (SSPP) Surveillance Applications

Parchami, Mostafa

View/Open

PARCHAMI-DISSERTATION-2017.pdf (12.30Mb)

Author

Parchami, Mostafa

0000-0003-3106-3808

Metadata

Show full item record

Abstract

Face Recognition (FR) is the task of identifying a person based on images of the face of the identity. Systems for video-based face recognition in video surveillance seek to recognize individuals of interest in real-time over a distributed network of surveillance cameras. These systems are exposed to challenging unconstrained environments, where the appearance of faces captured in videos varies according to pose, expression, illumination, occlusion, blur, scale, etc. In addition, facial models for matching must be designed using a single reference facial image per target individual captured from a high-quality still camera under controlled conditions. Deep learning has shown great improvement in both low-level and high-level computer vision tasks. More specifically, deep learning outperforms traditional machine learning algorithms in FR applications. Unfortunately, such methods are not designed to overcome the challenges in video-based FR such as difference in source and target domain, single sample per person (SSPP) issue, low quality images, etc. Therefore, more sophisticated algorithms should be designed to overcome these challenges. We propose to design different deep learning architectures and compare their capabilities under such circumstances. Deep learning can not only learn how to discriminate between faces, it can also learn how to extract more distinctive features for FR applications. Thus, in each chapter we pursue a different type of deep convolutional neural networks to extract meaningful face representations that are similar for faces of the same person and different for faces of different persons. Chapter 2 provides a novel method for implementing cross-correlation in deep learning architectures and benefits from transfer learning to overcome SSPP aspect of the problem. Later, chapter 3 improves the results by employing a triplet-loss training method. Chapter 4, uses a much complex architecture for face embedding to achieve better accuracy. Chapter 5, employs a convolutional autoencoder to frontalize faces and finally, chapter 6, shows another application of cross-correlation in deep learning. Extensive experiments confirm that all of the proposed methods outperform traditional computer vision systems.

URI

http://hdl.handle.net/10106/31665