Show simple item record

dc.contributor.advisorAthitsos, Vassilis
dc.creatorRezaei, Mohammad
dc.date.accessioned2022-08-31T12:45:37Z
dc.date.available2022-08-31T12:45:37Z
dc.date.created2022-08
dc.date.issued2022-08-23
dc.date.submittedAugust 2022
dc.identifier.urihttp://hdl.handle.net/10106/30922
dc.description.abstractHand analysis using vision systems is necessary for interaction between people and digital devices and thus is crucial in many applications relating to computer vision and human computer interaction (HCI). The proposed dissertation will explore hand analysis from depth images along two lines: hand part segmentation and 3D hand pose estimation. First, we investigate hand part segmentation from depth images, which is formulated as a semantic segmentation task. We explore a method aimed at determining for every pixel what hand part it belongs to. This method attempts to perform this task without requiring the ground-truth segmentation labels for training. It uses the 3D hand pose annotations, already provided with hand pose datasets, as a form of weak supervision for training. Both qualitative and quantitative experiments confirm the effectiveness of the proposed method. Second, we investigate a method that enables accurate 3D hand pose estimation from depth images. This is achieved by a novel formulation of the decomposition of the 3D hand pose estimation into the estimation of 2D joint locations in the depth image space (UV), and the estimation of their corresponding depths aided by two complementary attention maps. This decomposition prevents depth estimation, which is a more difficult task, from interfering with the UV estimations at both the prediction and feature levels. We empirically show that the proposed formulation of the decomposition of the 3D hand pose estimation and its interaction with two complementary attention maps estimated by the model by two separate branches leads to the state-of-the-art accuracy on three public 3D hand pose estimation benchmark datasets. Finally, we explore a semi-supervised method for 3D hand pose estimation from depth images. This method is aimed at reducing the reliance of model’s training on the ground-truth annotations, which are costly to acquire. This goal is achieved by adopting a student-teacher framework. The teacher network is trained by taking advantage of consistency training and adapting the latest advancements in semisupervised image classification methods. It generates pseudo-labels for training the student network. As the training progresses, the teacher network improves and generates more accurate pseudo-labels for the training of the student network, resulting in further improvement in the student network. For inference at test time, only the student network is used, and the teacher network is discarded after training. We conduct several experiments to demonstrate the effectiveness of the proposed framework.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subject3D hand pose estimation
dc.subjectHand part segmentation
dc.subjectDeep learning
dc.subjectSemi-supervised learning
dc.titleHAND ANALYSIS FROM DEPTH IMAGES
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2022-08-31T12:45:37Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record