Show simple item record

dc.contributor.authorDjurdjevic, Predragen_US
dc.date.accessioned2008-08-08T02:31:15Z
dc.date.available2008-08-08T02:31:15Z
dc.date.issued2008-08-08T02:31:15Z
dc.date.submittedApril 2008en_US
dc.identifier.otherDISS-2071en_US
dc.identifier.urihttp://hdl.handle.net/10106/976
dc.description.abstractReinforcement learning agents situated in real world environments have to be able to address a number of challenges in order to succeed at accomplishing a wide range of tasks over their lifetime. Among these, such systems have to be able to extract control knowledge from already learned tasks and apply them to subsequent ones in order to allow the agent to accomplish the new task faster and to accelerate the learning of an optimal policy. To address skill reuse and skill transfer, a number of approaches using hierarchical state and action spaces have been introduced recently which build on the idea of transferring the previously learned policies and representations to model and control the new task. However, while such transfer of skills can significantly improve learning times, it also poses the risk of "behavior proliferation" where the increasing set of available reusable actions makes it incrementally more difficult to determine a strategy for a new task. To address this issue, it is important for the agent to have the capability to analyze new tasks and to have a means of predicting the utility of an action or skill in a new context prior to learning a policy for the task. The former here implies an ability to decompose the new task into known subtasks while the latter implies the availability of an informed exploration policy used to find the new goal and to more efficiently learn a corresponding policy. This thesis presents a novel approach for learning task decomposition by learning to predict the utility of subgoals and subgoal types in the context of the new task, as well as for exploration shaping by predicting the likelihood with which each available action is useful in the given task context. To achieve this, the approach presented here uses past learning experiences to acquire set of utility functions that encode relevant knowledge about useful subgoals and skills and applies them to shape the search for the optimal policy for the new task. Acceleration is achieved by focusing the search on contextually identifiable subgoals and actions/skills that have been learned to be valuable in the context of optimal policies in the previously encountered worlds. Performance increase is achieved here both in terms of the time required to reach the task's goal the first time and time required to learn an optimal policy, which is demonstrated in the context of navigation and manipulation tasks in a grid world domain.en_US
dc.description.sponsorshipHuber, Manfreden_US
dc.language.isoENen_US
dc.publisherComputer Science & Engineeringen_US
dc.titleHierarchical Reinforcement Learning Using Automatic Task Decomposition And Exploration Shapingen_US
dc.typeM.S.en_US
dc.contributor.committeeChairHuber, Manfreden_US
dc.degree.departmentComputer Science & Engineeringen_US
dc.degree.disciplineComputer Science & Engineeringen_US
dc.degree.grantorUniversity of Texas at Arlingtonen_US
dc.degree.levelmastersen_US
dc.degree.nameM.S.en_US
dc.identifier.externalLinkhttps://www.uta.edu/ra/real/editprofile.php?onlyview=1&pid=190
dc.identifier.externalLinkDescriptionLink to Research Profiles


Files in this item

Thumbnail


This item appears in the following Collection(s)

Show simple item record