Statistical Linearization for Value Function Approximation in Reinforcement Learning
Abstract
Reinforcement learning (RL) is a machine learning answer to the optimal control problem. It consists in learning an optimal control policy through interactions with the system to be controlled, the quality of this policy being quantified by the so-called value function. An important RL subtopic is to approximate this function when the system is too large for an exact representation. This paper presents statistical-linearization-based approaches to estimate such functions. Compared to more classical approaches, this allows considering nonlinear parameterizations as well as the Bellman optimality operator, which induces some differentiability problems. Moreover, the statistical point of view adopted here allows considering colored observation noise models instead of the classical white one; in RL, this can provide useful.