Kalman Temporal Differences: Uncertainty and Value Function Approximation - CentraleSupélec Accéder directement au contenu
Communication Dans Un Congrès Année : 2008

Kalman Temporal Differences: Uncertainty and Value Function Approximation

Résumé

This paper deals with value (and Q-) function approximation in deterministic Markovian decision processes (MDPs). A general statistical framework based on the Kalman filtering paradigm is introduced. Its principle is to adopt a parametric representation of the value function, to model the associated parameter vector as a random variable and to minimize the mean- squared error of the parameters conditioned on past observed transitions. From this general framework, which will be called Kalman Temporal Differences (KTD), and using an approximation scheme called the unscented transform, a family of algorithms is derived. Contrary to most of function approximation schemes, this framework inherently allows to derive uncertainty information over the value function, which can be notably useful for the exploration/exploitation dilemma.
Fichier non déposé

Dates et versions

hal-00351298 , version 1 (08-01-2009)

Identifiants

  • HAL Id : hal-00351298 , version 1

Citer

Matthieu Geist, Olivier Pietquin, Gabriel Fricout. Kalman Temporal Differences: Uncertainty and Value Function Approximation. NIPS Workshop on Model Uncertainty and Risk in Reinforcement Learning, Dec 2008, Vancouver, Canada. ⟨hal-00351298⟩
30 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More