Deep Reinforcement Learning-based Portfolio Management

Kanwar, Nitin

View/Open

KANWAR-THESIS-2019.pdf (2.771Mb)

Date

2019-05-16

Author

Kanwar, Nitin

0000-0001-7060-0587

Metadata

Show full item record

Abstract

Machine Learning is at the forefront of every field today. The subfields of Machine Learning called Reinforcement Learning and Deep Learning, when combined have given rise to advanced algorithms which have been successful at reaching or surpassing the human-level performance at playing Atari games to defeating multiple times champion at Go. These successes of Machine Learning have attracted the interest of the financial community and have raised the question if these techniques could also be applied in detecting patterns in the financial markets. Until recently, mathematical formulations of dynamical systems in the context of Signal Processing and Control Theory have attributed to the success of Financial Engineering. But because of Reinforcement Learning, there has been improved sequential decision making leading to the development of multistage stochastic optimization, a key component in sequential portfolio optimization (asset allocation) strategies. In this thesis, we explore how to optimally distribute a fixed set of stock assets from a given set of stocks in a portfolio to maximize the long term wealth of the Deep Learning trading agent using Reinforcement Learning. We treat the problem as context-independent, meaning the learning agent directly interacts with the environment, thus allowing us to apply model free Reinforcement Learning algorithms to get optimized results. In particular, we focus on Policy Gradient and Actor Critic Methods, a class of state-of-the-art techniques which constructs an estimate of the optimal policy for the control problem by iteratively improving a parametric policy. We perform a comparative analysis of the Reinforcement Learning based portfolio optimization strategy vs the more traditional “Follow the Winner”, “Follow the Loser”, and "Uniformly Balanced" strategies, and find that Reinforcement Learning based agents either far out perform all the other strategies, or behave as good as the best of them. The analysis provides conclusive support for the ability of model-free Policy Gradient based Reinforcement Learning methods to act as universal trading agents.

URI

http://hdl.handle.net/10106/28108