HAND ANALYSIS FROM DEPTH IMAGES

Rezaei, Mohammad

dc.contributor.advisor	Athitsos, Vassilis
dc.creator	Rezaei, Mohammad
dc.date.accessioned	2022-08-31T12:45:37Z
dc.date.available	2022-08-31T12:45:37Z
dc.date.created	2022-08
dc.date.issued	2022-08-23
dc.date.submitted	August 2022
dc.identifier.uri	http://hdl.handle.net/10106/30922
dc.description.abstract	Hand analysis using vision systems is necessary for interaction between people and digital devices and thus is crucial in many applications relating to computer vision and human computer interaction (HCI). The proposed dissertation will explore hand analysis from depth images along two lines: hand part segmentation and 3D hand pose estimation. First, we investigate hand part segmentation from depth images, which is formulated as a semantic segmentation task. We explore a method aimed at determining for every pixel what hand part it belongs to. This method attempts to perform this task without requiring the ground-truth segmentation labels for training. It uses the 3D hand pose annotations, already provided with hand pose datasets, as a form of weak supervision for training. Both qualitative and quantitative experiments confirm the effectiveness of the proposed method. Second, we investigate a method that enables accurate 3D hand pose estimation from depth images. This is achieved by a novel formulation of the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space (UV), and the estimation of their corresponding depths aided by two complementary attention maps. This decomposition prevents depth estimation, which is a more difficult task, from interfering with the UV estimations at both the prediction and feature levels. We empirically show that the proposed formulation of the decomposition of the 3D hand pose estimation and its interaction with two complementary attention maps estimated by the model by two separate branches leads to the state-of-the-art accuracy on three public 3D hand pose estimation benchmark datasets. Finally, we explore a semi-supervised method for 3D hand pose estimation from depth images. This method is aimed at reducing the reliance of model’s training on the ground-truth annotations, which are costly to acquire. This goal is achieved by adopting a student-teacher framework. The teacher network is trained by taking advantage of consistency training and adapting the latest advancements in semisupervised image classification methods. It generates pseudo-labels for training the student network. As the training progresses, the teacher network improves and generates more accurate pseudo-labels for the training of the student network, resulting in further improvement in the student network. For inference at test time, only the student network is used, and the teacher network is discarded after training. We conduct several experiments to demonstrate the effectiveness of the proposed framework.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	3D hand pose estimation
dc.subject	Hand part segmentation
dc.subject	Deep learning
dc.subject	Semi-supervised learning
dc.title	HAND ANALYSIS FROM DEPTH IMAGES
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2022-08-31T12:45:37Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: REZAEI-DISSERTATION-2022.pdf
Size:: 1.393Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record