Generalized Reinforcement Learning With An Application In Generic Workload Management Systems

Chiu, Po-Hsiang

dc.contributor.author	Chiu, Po-Hsiang	en_US
dc.date.accessioned	2011-10-11T20:48:18Z
dc.date.available	2011-10-11T20:48:18Z
dc.date.issued	2011-10-11
dc.date.submitted	January 2011	en_US
dc.identifier.other	DISS-11337	en_US
dc.identifier.uri	http://hdl.handle.net/10106/6150
dc.description.abstract	Learning by trial and error and being able to form levels of abstraction from the past experience has been an important factor for sentient beings to develop intelligent behaviors and cope with an ever-changing environment. Complex control domains, in a similar way, often require the interacting agents to learn adaptive control strategies for time-varying or potentially evolving systems. This dissertation will begin by investigating an example of complex domains, Grid computing networks, through a collaborative effort in the design and implementation of a generic workload management system, PanDA-PF WMS used in the ATLAS experiment. With the incentive of boosting the performance of PanDA-PF WMS and increasing its applicability in a general resource-sharing environment, we will subsequently motivate an automated and adaptive learning approach that optimizes computational resource usage. From the experience of developing Grid applications such as PanDA-PF, we found that a flexible infrastructure still has its limit in performance both from the perspective of high-performance computing (HPC) and high-throughput computing (HTC). The key is that an optimal resource allocation strategy is highly contingent upon many factors hidden in the intricate dynamics behind the scene, including the task distribution and real-time resource profile in addition to compatibility between the user tasks and the allocated machines, etc. The reinforcement learning framework establishes a unique way of solving a wide range of control and planning tasks through the state space representation of the system over which the control policy unfolds as a sequence of control decisions toward a maximum payoff. Intuitively, reinforcement learning seems to be an ideal candidate among machine learning methods for developing an optimal resource allocation strategy that harvests free computation resources by learning their intricate dynamics.However, our hope in applying standard reinforcement learning in the context of resource allocation is diminished due to an inherent limitation in its representation. In particular, the control policy is often formulated from the perspective of decision theoretic planning (DTP) such that actions, as control decisions, are assumed to be atomic with fixed action semantics. Consequently, the derived policy in general lacks the ability in adapting to possible variations in the action outcomes or the action set itself due to versatility of the system. This would be a major barrier in learning an ideal resource allocation strategy where each compute resource is often characterized by time-varying properties that determine its performance. In addition, the available resource may be highly volatile depending on the resource-sharing infrastructure. In a dynamic computational cluster, for instance, the underlying resource is acquired on-demand in terms of distributed virtual machines that may not be persistently available to end users. As a consequence, the optimal strategy for task assignment learned earlier may not be strictly applicable in the future. Inspired by the challenge in complex domains like optimal resource sharing, this dissertation will progressively develop an extended reinforcement learning framework with a concept-driven learning architecture that enables adaptive policy learning over the abstraction of the progressively evolved samples of experience. In particular, we provide an alternative view of reinforcement learning by establishing the notion of the reinforcement field through a collection of policy-embedded particles gathered during the policy learning process. The reinforcement field serves as a policy generalization mechanism over correlated decisions through the use of kernel functions as a state correlation hypothesis in combination with Gaussian process regression as a value function approximator. Subsequently, through "kernelizing" the spectral clustering mechanism, the policy-learning experience retained in the memory of the agent can be further subdivided into a set of concept-driven abstract actions, each of which implicitly encodes a set of context-dependent local policies. We will show from a simulated task-assignment domain that the end result of our generalized reinforcement learning framework will enable both the learning of an action-oriented conceptual model and simultaneously deriving an optimal policy out of the high-level conceptual units. Moreover, to demonstrate the general applicability of our learning approach, we apply the work in a generalized navigation domain - the gridworld without the grid in which the agent is free to move in all directions with stochastic behaviors in actions and subsequently show the learning result in terms of both an improved learning curve and reinforcement field plots.	en_US
dc.description.sponsorship	Huber, Manfred	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Science & Engineering	en_US
dc.title	Generalized Reinforcement Learning With An Application In Generic Workload Management Systems	en_US
dc.type	Ph.D.	en_US
dc.contributor.committeeChair	Huber, Manfred	en_US
dc.degree.department	Computer Science & Engineering	en_US
dc.degree.discipline	Computer Science & Engineering	en_US
dc.degree.grantor	University of Texas at Arlington	en_US
dc.degree.level	doctoral	en_US
dc.degree.name	Ph.D.	en_US

Files in this item

Name:: Chiu_uta_2502D_11337.pdf
Size:: 26.58Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Show simple item record