Understanding human actions: Cognitive assessment and action segmentation using human object interaction

Sayed, Saif

dc.contributor.advisor	Athitsos, Vassilis
dc.creator	Sayed, Saif
dc.date.accessioned	2023-01-26T16:13:52Z
dc.date.available	2023-01-26T16:13:52Z
dc.date.created	2022-12
dc.date.issued	2022-11-07
dc.date.submitted	December 2022
dc.identifier.uri	http://hdl.handle.net/10106/31011
dc.description.abstract	Automatic understanding of human behavior has several applications in medicine and surveillance. Analysing human actions can enable cognitive assessment of children by measuring their hyperactivity and response inhibition which can give physicians better understanding of their cognitive state. Automatic and non-invasive assessment for cognitive disorders will increase the affordability and reach for these detection methods and can prove life-changing in child’s development. Human activity can also be analysed in common settings such as cooking in kitchen and understanding the information of human object interaction can give priors on the underlying activity they are performing. In the first section, we focus on cognitive assessment. We introduce specifically a new dataset towards development of automated system for the Activate Test of Embodied Cognition (ATEC), a measurement that evaluates cognitive skills through physical activity. Evaluating cognitive skills through physical activity requires subjects performing wide variety of tasks with varying levels of complexity. To make the system afford- able and reachable to larger population, we created an automated system that can score these human activities as accurately as an expert. To this end, we developed and activity recognition system for one of the most challenging task in ATEC, called Cross-Your-Body which can evaluate attention, response inhibition, rhythm and co-ordination, task switch- ing, working memory. We created and annotated the dataset that enabled us for training of vision based activity segmentation models. First, we developed a very accurate system that requires trimmed video as input where every video has only one action and predicts the human activity by tracking the human pose features. Second, we improved the system to create an end-to-end method that can track multiple activities in an untrimmed video which enabled the generation of scores that can directly transfer to the expert human’s score with high inter-rater reliability. In the second section, we study action segmentation in instructional videos under timestamp supervision. In the action segmentation domain, the goal is to temporally divide the input video into set of sequential actions. In fully supervised setting the training labels are given for every frame while in weakly supervised settings, the labels are at video level and are sequence of actions. While the weakly supervised labels reduces the annotation time for labeling videos, it lacks test performance as comparable to a fully supervised setting by a big gap. To alleviate this problem, in addition to the sequence of actions, timestamp supervision also adds a single frame number for each action which adds significant constraints on when each activity may happen. We study timestamp supervision under several scenarios. First, we created a new approach that utilizes human object interaction (HOI) as a source of information other than the exisiting flow and rgb information. The system creates new pseudo-groundtruth by expanding the the timestamp annotations using the information from an off-the-shelf pre-trained HOI detector, that requires no additional HOI-related annotations. We also improved the temporal modelling system from temporal convolution based to transformer one which further improved the performance. Second, to enable the research on HOI and multi-view action segmentation, we created a first of it’s kind dataset called (3+1)Rec, which has 1799 long-length, high quality videos comprising of 3 third person view and 1 egocentric for each dish the subject is making in a kitchen environment.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Action segmentation
dc.subject	Computer vision
dc.subject	Cognitive assessment
dc.title	Understanding human actions: Cognitive assessment and action segmentation using human object interaction
dc.type	Thesis
dc.date.updated	2023-01-26T16:13:53Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text
dc.creator.orcid	0000-0002-4270-7616

Files in this item

Name:: SAYED-DISSERTATION-2022.pdf
Size:: 43.72Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record