Thejani Gamage: Reinforcement Learning for the optimal dividend problem - continued
Speaker
Thejani Gamage (UMass Amherst)
Abstract
We study the optimal dividend problem with restricted dividend rate first under the continuous time diffusion model and then under the well-known “Cram ́er-Lundberg” model. Unlike the standard literature, we shall particularly be interested in the case with unknown parameters so that the optimal control cannot be explicitly determined. To approximate the optimal strategy, we use methods from the Reinforcement Learning (RL) literature, specifically, the method of solving the corresponding RL-type entropy-regularized exploratory control problem. We shall first carry out a theoretical analysis of the entropy-regularized exploratory control problem focusing particularly on the corresponding HJB equation. We will then use a policy improvement argument, along with policy evaluation devices to construct approximating sequences of the optimal strategy. We present some numerical results using different parametrization families for the cost functional, to illustrate the effectiveness of the approximation schemes and to discuss possible methodologies to improve the effectiveness of Policy Evaluation methodologies.