Neural Image and Video Understanding

Fakoor, Rasool

dc.contributor.advisor	Huber, Manfred
dc.creator	Fakoor, Rasool
dc.date.accessioned	2017-10-02T15:05:40Z
dc.date.available	2017-10-02T15:05:40Z
dc.date.created	2017-08
dc.date.issued	2017-08-25
dc.date.submitted	August 2017
dc.identifier.uri	http://hdl.handle.net/10106/27000
dc.description.abstract	Even though recent works on neural architectures have shown promising results at tasks like image recognition, object detection, playing Atari games, etc., learning a mapping from a visual space to a language space or vice versa remains challenging in problems like image/video captioning or question-answering tasks. Furthermore, transferring knowledge between seen and unseen classes in a setting like zero-shot learning is quite challenging given the fact that a model should be able to make a prediction for novel test data belonging to classes for which no examples have been seen during training. To address these issues, this dissertation will first introduce a novel memory-based attention model for video description. Specifically, attention-based models have shown promising and interesting results for image captioning. However, they are not able to model the higher-order interactions involved in problems such as video description/captioning, where the relationship between parts of the video and the concepts being depicted is complex. The proposed model here utilizes memories of past attention when reasoning about where to attend to, in the current time step. Secondly, this dissertation will introduce an end-to-end deep neural network model for attribute-based zero-shot learning with layer-specific regularization that encourages the higher, class-level layers to generalize beyond the training classes. This architecture enables the model to 'transfer' knowledge learned from seen training images to a set of novel, unseen test images.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Video captioning
dc.subject	Attention model
dc.subject	Deep learning
dc.subject	Transfer learning
dc.subject	Imposing structure
dc.subject	Differentiable memory
dc.title	Neural Image and Video Understanding
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2017-10-02T15:06:45Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: FAKOOR-DISSERTATION-2017.pdf
Size:: 8.926Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record