LEARNING TRANSFERABLE META-POLICIES FOR HIERARCHICAL TASK DECOMPOSITION AND PLANNING COMPOSITION

Djurdjevic, Predrag

dc.contributor.advisor	Huber, Manfred
dc.creator	Djurdjevic, Predrag
dc.date.accessioned	2020-06-12T22:15:45Z
dc.date.available	2020-06-12T22:15:45Z
dc.date.created	2020-05
dc.date.issued	2019-12-16
dc.date.submitted	May 2020
dc.identifier.uri	http://hdl.handle.net/10106/29100
dc.description.abstract	In real world scenarios where situated agents are faced with dynamic, high-dimensional, partially observable environments with action and reward uncertainty, the traditional states space Reinforcement Learning (RL) becomes easily prohibitively large for policy learning. In such scenarios, addressing the curse of dimensionality and eventual transfer to closely related tasks is one of the principal challenges and motivations for Hierarchical Reinforcement Learning (HRL). The prime appeal of hierarchal and particularly recursive approaches is in effective factored state, transition and reward representations which abstract out aspects that are not relevant to subtasks and allow potential transfer of skills which represent solutions to potential task subspaces. With the advent of deep learning techniques, a range of techniques for representation learning have become available for a range of problems, mostly in supervised learning applications, however, relatively little has been applied in the context of hierarchical Reinforcement Learning where different time scales are important and where limited access to large training data sets and reduced feedback has made learning on these structures difficult. Moreover, the addition of partial observability and the corresponding need to encode memory through recurrent connections further increase this complexity and very limited work in this direction exists. This dissertation investigates the use of recurrent deep learning structures to automatically learn hierarchical state and policy structures without the need for supervised data in the context of Reinforcement Learning problems. In particular, it proposes and evaluates two novel network architectures, one based on Conditional Restricted Boltzmann Machines (CRBM) and one using a Multidimensional Multidirectional Multiscale LSTM network. Experiments using a very sparsely observable version of the common taxi domain problem show the potential of the architectures and illustrate its ability to build hierarchical, reusable representations both in terms of state representations and learned policy actions.
dc.format.mimetype	application/pdf
dc.language.iso	en_US
dc.subject	Reinforcement learning
dc.subject	Q learning
dc.subject	DQN
dc.subject	Long short term memory
dc.subject	Conditional restricted boltzmann machines
dc.subject	Transfer learning
dc.subject	POMDP
dc.title	LEARNING TRANSFERABLE META-POLICIES FOR HIERARCHICAL TASK DECOMPOSITION AND PLANNING COMPOSITION
dc.type	Thesis
dc.degree.department	Computer Science and Engineering
dc.degree.name	Doctor of Philosophy in Computer Science
dc.date.updated	2020-06-12T22:15:46Z
thesis.degree.department	Computer Science and Engineering
thesis.degree.grantor	The University of Texas at Arlington
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy in Computer Science
dc.type.material	text

Files in this item

Name:: DJURDJEVIC-DISSERTATION-2020.pdf
Size:: 1.061Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record