## Optimization And Reinforcement Learning Techniques In Multi-agent Graphical Games And Economic Dispatch

##### Abstract

This work discusses optimization and reinforcement learning techniques in power system Economic Dispatch and Multi-Agent graphical games. Power System Economic Dispatch (ED) is one of the power system energy management tools that is used to allocate required power generation to a number of generating units to meet the active load demand [109]. The operation cost of the power utilities depends on the fuel cost of the generating units. By optimizing the objective functions that depend on the fuel cost, the Economic Dispatch results in fuel cost savings, [25]. The generation cost functions are either smooth or non-smooth based on the nature of the generating units. One source of non-convexity is the physical constraints of the generation units such as spinning reserve, transmission losses, prohibited operation zones, ramp rate limit, valve point loading effect, and multiple fuel options [109]. Besides, some generating units have multiple steam valves, which open in a sequential manner. This introduces mathematical difficulty to the generation cost function by adding the effect of the ripples to the generation cost function [4], [5], [56]. This makes the Economic Dispatch problem a large-scale nonlinear constraint optimization problem. The dynamic graphical game results from multi-agent dynamical systems, where it is desired to make all the agents synchronize to the state of a command generator or leader agent, the interactions between agents are prescribed by a communication graph structure. Cooperative control refers to a dynamical systems interconnected by a communication graph. Synchronization allows each agent of the cooperative team to reach the same state by the proper design of decision and control protocols. In multi-player cooperative games Nash solutions relies on solving coupled Hamilton Jacobi equations. The result is the Nash equilibrium solution. In this work, a new class of multi-agent discrete-time games known as dynamic graphical games is developed. A new notation of interactive Nash equilibrium is introduced which holds if all agents are in Nash equilibrium and the graph is strongly connected. Reinforcement Learning (RL) techniques are used to solve these dynamic graphical games online. A set of coupled Riccati recursions will be derived to provide offline solutions for the dynamic graphical game. Approximate Dynamic Programming (ADHDP) or Q learning is used to solve the dynamic graphical game, where the dynamics of the agents are not required. In the Q-learning approach, a parametric structure is used to approximate the Q-function of the control policy of each agent. Furthermore, the notion of differential graphical games is developed for continuous-time multi-agent systems. Nash solutions and best response solutions are given in terms of solutions to continuous-time IRL HJB equations. Finally, integral reinforcement learning structures are developed to solve the dynamic graphical game using policy iteration.