Video-based Face Recognition using Deep Learning for Single Sample Per Person (SSPP) Surveillance Applications
Abstract
Face Recognition (FR) is the task of identifying a person based on images of the face of the
identity. Systems for video-based face recognition in video surveillance seek to recognize
individuals of interest in real-time over a distributed network of surveillance cameras. These
systems are exposed to challenging unconstrained environments, where the appearance of
faces captured in videos varies according to pose, expression, illumination, occlusion, blur,
scale, etc. In addition, facial models for matching must be designed using a single reference
facial image per target individual captured from a high-quality still camera under controlled
conditions. Deep learning has shown great improvement in both low-level and high-level
computer vision tasks. More specifically, deep learning outperforms traditional machine
learning algorithms in FR applications. Unfortunately, such methods are not designed to
overcome the challenges in video-based FR such as difference in source and target domain,
single sample per person (SSPP) issue, low quality images, etc. Therefore, more sophisticated
algorithms should be designed to overcome these challenges. We propose to design different
deep learning architectures and compare their capabilities under such circumstances. Deep
learning can not only learn how to discriminate between faces, it can also learn how to extract
more distinctive features for FR applications. Thus, in each chapter we pursue a different
type of deep convolutional neural networks to extract meaningful face representations that
are similar for faces of the same person and different for faces of different persons. Chapter
2 provides a novel method for implementing cross-correlation in deep learning architectures
and benefits from transfer learning to overcome SSPP aspect of the problem. Later, chapter 3
improves the results by employing a triplet-loss training method. Chapter 4, uses a much
complex architecture for face embedding to achieve better accuracy. Chapter 5, employs a
convolutional autoencoder to frontalize faces and finally, chapter 6, shows another application
of cross-correlation in deep learning. Extensive experiments confirm that all of the proposed
methods outperform traditional computer vision systems.