Show simple item record

dc.contributor.advisorHuber, Manfred
dc.creatorDjurdjevic, Predrag
dc.date.accessioned2020-06-12T22:15:45Z
dc.date.available2020-06-12T22:15:45Z
dc.date.created2020-05
dc.date.issued2019-12-16
dc.date.submittedMay 2020
dc.identifier.urihttp://hdl.handle.net/10106/29100
dc.description.abstractIn real world scenarios where situated agents are faced with dynamic, high-dimensional, partially observable environments with action and reward uncertainty, the traditional states space Reinforcement Learning (RL) becomes easily prohibitively large for policy learning. In such scenarios, addressing the curse of dimensionality and eventual transfer to closely related tasks is one of the principal challenges and motivations for Hierarchical Reinforcement Learning (HRL). The prime appeal of hierarchal and particularly recursive approaches is in effective factored state, transition and reward representations which abstract out aspects that are not relevant to subtasks and allow potential transfer of skills which represent solutions to potential task subspaces. With the advent of deep learning techniques, a range of techniques for representation learning have become available for a range of problems, mostly in supervised learning applications, however, relatively little has been applied in the context of hierarchical Reinforcement Learning where different time scales are important and where limited access to large training data sets and reduced feedback has made learning on these structures difficult. Moreover, the addition of partial observability and the corresponding need to encode memory through recurrent connections further increase this complexity and very limited work in this direction exists. This dissertation investigates the use of recurrent deep learning structures to automatically learn hierarchical state and policy structures without the need for supervised data in the context of Reinforcement Learning problems. In particular, it proposes and evaluates two novel network architectures, one based on Conditional Restricted Boltzmann Machines (CRBM) and one using a Multidimensional Multidirectional Multiscale LSTM network. Experiments using a very sparsely observable version of the common taxi domain problem show the potential of the architectures and illustrate its ability to build hierarchical, reusable representations both in terms of state representations and learned policy actions.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.subjectReinforcement learning
dc.subjectQ learning
dc.subjectDQN
dc.subjectLong short term memory
dc.subjectConditional restricted boltzmann machines
dc.subjectTransfer learning
dc.subjectPOMDP
dc.titleLEARNING TRANSFERABLE META-POLICIES FOR HIERARCHICAL TASK DECOMPOSITION AND PLANNING COMPOSITION
dc.typeThesis
dc.degree.departmentComputer Science and Engineering
dc.degree.nameDoctor of Philosophy in Computer Science
dc.date.updated2020-06-12T22:15:46Z
thesis.degree.departmentComputer Science and Engineering
thesis.degree.grantorThe University of Texas at Arlington
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy in Computer Science
dc.type.materialtext


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record