Revisiting Natural Actor-Critics with Value Function Approximation

Matthieu Geist; Olivier Pietquin

doi:10.1007/978-3-642-16292-3_21

Communication Dans Un Congrès Année : 2010

Revisiting Natural Actor-Critics with Value Function Approximation

(1) , (1)

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

SUPELEC-Campus Metz

Résumé

Actor-critics architectures have become popular during the last decade in the field of reinforcement learning because of the introduction of the policy gradient with function approximation theorem. It allows combining rationally actor-critic architectures with value function approximation and therefore addressing large-scale problems. Recent researches led to the replacement of policy gradient by a natural policy gradient, improving the efficiency of the corresponding algorithms. However, a common drawback of these approaches is that they require the manipulation of the so-called advantage function which does not satisfy any Bellman equation. Consequently, derivation of actor-critic algorithms is not straightforward. In this paper, we re-derive theorems in a way that allows reasoning directly with the state-action value function (or Q-function) and thus relying on the Bellman equation again. Consequently, new forms of critics can easily be integrated in the actor-critic framework.

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00553870

Soumis le : lundi 10 janvier 2011-10:03:52

Dernière modification le : mardi 14 février 2023-03:36:49

Dates et versions

hal-00553870 , version 1 (10-01-2011)

Identifiants

HAL Id : hal-00553870 , version 1
DOI : 10.1007/978-3-642-16292-3_21

Citer

Matthieu Geist, Olivier Pietquin. Revisiting Natural Actor-Critics with Value Function Approximation. MDAI 2010, Oct 2010, Perpignan, France. pp.207-218, ⟨10.1007/978-3-642-16292-3_21⟩. ⟨hal-00553870⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

320 Consultations

0 Téléchargements

Revisiting Natural Actor-Critics with Value Function Approximation

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager