Deep Reinforcement Learning-based Portfolio Management
Abstract
Machine Learning is at the forefront of every field today. The subfields of Machine Learning called Reinforcement Learning and Deep Learning, when combined have given rise to advanced algorithms which have been successful at reaching or surpassing the human-level performance at playing Atari games to defeating multiple times champion at Go. These successes of Machine Learning have attracted the interest of the financial community and have raised the question if these techniques could also be applied in detecting patterns in the financial markets.
Until recently, mathematical formulations of dynamical systems in the context of Signal Processing and Control Theory have attributed to the success of Financial Engineering. But because of Reinforcement Learning, there has been improved sequential decision making leading to the development of multistage stochastic optimization, a key component in sequential portfolio optimization (asset allocation) strategies.
In this thesis, we explore how to optimally distribute a fixed set of stock assets from a given set of stocks in a portfolio to maximize the long term wealth of the Deep Learning trading agent using Reinforcement Learning. We treat the problem as context-independent, meaning the learning agent directly interacts with the environment, thus allowing us to apply model free Reinforcement Learning algorithms to get optimized results. In particular, we focus on Policy Gradient and Actor Critic Methods, a class of state-of-the-art techniques which constructs an estimate of the optimal policy for the control problem by iteratively improving a parametric policy.
We perform a comparative analysis of the Reinforcement Learning based portfolio optimization strategy vs the more traditional “Follow the Winner”, “Follow the Loser”, and "Uniformly Balanced" strategies, and find that Reinforcement Learning based agents either far out perform all the other strategies, or behave as good as the best of them.
The analysis provides conclusive support for the ability of model-free Policy Gradient based Reinforcement Learning methods to act as universal trading agents.
Related items
Showing items related by title, author, creator and subject.
-
The Effectiveness Of A Virtual Learning Environment On Student Learning About Clinical Skills
Youn, Eric (Social Work, 2007-09-19)The purpose of this study was to explore whether social work students could increase their clinical skills knowledge in a Clinical Skills Virtual Learning Environment (CS-VLE). The evaluation of an online exercise that ... -
Effects Of Experiential Learning On Face To Face And Multimedia Learning Environments In The Acquisition Of Active Listening Skills
Huerta Wong, Juan Enrique (Social Work, 2008-04-22)Comparison studies between Technology or Multimedia Learning Environments and Face-to-Face (F2F) Learning Environments in social work education usually show that technology works as well as F2F environments. However, they ... -
Learning State And Action Space Hierarchies For Reinforcement Learning Using Action-Dependent Partitioning
Asadi, Mehran (Computer Science & Engineering, 2007-08-23)Autonomous systems are often dicult to program. Reinforcement learning (RL) is an attractive alternative, as it allows the agent to learn behavior on the basis of sparse, delayed reward signals provided only when the ...