A Design And Analysis Of Computer Experiments-based Approach To Approximate Infinite Horizon Dynamic Programming With Continuous State Spaces
MetadataShow full item record
Dynamic programming (DP) is an optimization approach that transforms a complex problem into a sequence of simpler sub-problems at different points in stage. The original DP approach used Bellman's equation to compute the "cost-to-go' function. This method is useful when considering a few states and decisions. However, when dealing with high-dimensional data set with continuous state space, the limit called 'curse of dimensionality' obstructs the solution as the size of the state space grows exponentially. Given recent advances in computational power, approximate dynamic programming (ADP) is introduced by not seeking to compute the future value function exactly and at each point of the state space; rather opting for an approximation of the future value function in the domain of the state space. Two main components of ADP method which have been challenged among existing ADP studies are discretization of the state space and estimation of the cost-to-go or future value function.The first part of this dissertation research seeks to develop a solution method to solve an infinite horizon dynamic programming called Design and Analysis of Computer Experiment (DACE)-based Approach to ADP. Multivariate Adaptive Regression Splines (MARS) which is a flexible, nonparametric statistical modeling tool is used to approximate future value functions in stochastic dynamic programming (SDP) problems with continuous state variables. The training data set is updated sequentially based on the conditions. This sequential grid discretization explores the state space and provides a statistically parsimonious ADP methodology which `adaptively' captures the important variables from the state space. There are 3 different algorithms presented in this dissertation based on the conditions of sampling process of the training data set. Comparisons are presented on a forward simulation with 12 time periods. The second part of the dissertation research is to develop a batch mode Reinforcement Learning (RL) using MARS as an approximator to solve the same problem with the first part. The main difference between these two methods is the input variables to approximate future value function. In batch mode RL method, the state-action space is used, thus the estimated function (output) is a function of both state and action variables. By contrast, DACE-based ADP used only state variable and the estimated future function is based only on state variables. The study on state-action discretization is presented in this dissertation. Two different designs are used, including Monte Carlo sampling and Sobol' sequence design. Comparisons are presented on the same forward simulation.The third part is to develop a two-stage framework for Adaptive Design for Controllability of a System of Plug-in Hybrid Electric Vehicle Charging Stations Case Study. The second-stage dynamic control problem is formulated and initially solved by mean value problem using linear programming. After that a DACE approach is used to develop a metamodel of the second stage solution based on the possible solution from the first stage. Then the metamodel will be turned into the first stage and at this point the final solution will be made. DACE helps reduce time-consuming computer models by replacing the loop between first and second stage with a constraint generated from the gradient of the approximation function. Moreover, the metamodel can give more accessible description to the second stage.