High-dimensional Adaptive Dynamic Programming With Mixed Integer Linear Programming
Dynamic programming (DP, Bellman 1957) is a classic mathematical programming approach to solve multistage decision problems. The “Bellman equation” uses a recursive concept that includes both the current contribution and future contribution in the objective function of an optimization. The method has potential to represent dynamic decision-making systems, but an exact DP solution algorithm is limited to small problems with restrictions, such as problems with linear transitions and problems without uncertainty. Approximate dynamic programming (ADP) is a modern branch of DP that seeks to achieve numerical solutions via approximation. It is can be applied to real-world DP problems, but there are still challenges for high dimensions. This dissertation focuses on ADP value function approximation for a continuous-state space using the statistical perspective (Chen et al. 1999). Two directions of ADP methodology are developed: a sequential algorithm to explore the state space, and a sequential algorithm using mixed integer linear programming (MILP) and regression trees.The first component addresses exploration of the state space. A sequential state space exploration (SSSE) algorithm (Fan 2008) was developed using neural networks. Here it is considered the use of multivariate adaptive regression splines (Friedman 1991) in place of neural networks. In ADP, the value function approximation is defined over a specified state space region. In the real world, the relevant state space region is unknown. In particular, the ADP approach employed in this dissertation uses a statistical perspective that is analogous to design and analysis of computer experiments (DACE, Chen et al. 2006). In DACE, an experimental design is used to discretize the input space region, which is the state space region in ADP. Since the ADP state space region is unknown, SSSE uses the stochastic trajectories to sample future states and identify the potential range of system state. By reducing iterations without impacting solution quality, SSSE using MARS demonstrates improved efficiency over SSSE with neural networks.The second component of this dissertation addresses the optimization of a real world, complex, dynamic system. This work is motivated by a case study on the environmental impact of aircraft deicing activities at the Dallas-Fort Worth (D/FW) International Airport. For this case study, the state transitions are nonlinear, the objective function is nonconvex, and the decision (action) space is high-dimensional and discrete. To overcome these complexities, an ADP method is introduced using a piecewise linear value function approximation and MILP. The piecewise linear structure can be transformed for MILP by introducing binary variables. The treed regression value function approximation is flexible enough to approximate the nonlinear structure of the data and structured to be easily formulated in MILP. The proposed ADP approach is compared with a reinforcement learning (Murphy 2005) approach.