Statistically linearized least-squares temporal differences

Matthieu Geist; Olivier Pietquin

doi:10.1109/ICUMT.2010.5676598

Communication Dans Un Congrès Année : 2010

Statistically linearized least-squares temporal differences

(1) , (1)

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

SUPELEC-Campus Metz

Résumé

A common drawback of standard reinforcement learning algorithms is their inability to scale-up to real-world problems. For this reason, a current important trend of research is (state-action) value function approximation. A prominent value function approximator is the least-squares temporal differences (LSTD) algorithm. However, for technical reasons, linearity is mandatory: the parameterization of the value function must be linear (compact nonlinear representations are not allowed) and only the Bellman evaluation operator can be considered (imposing policy-iteration-like schemes). In this paper, this restriction of LSTD is lifted thanks to a derivative-free statistical linearization approach. This way, nonlinear parameterizations and the Bellman optimality operator can be taken into account (this last point allows taking into account value-iteration-like schemes). The efficiency of the resulting algorithms are demonstrated using a linear parametrization and neural networks as well as on a Q-learning-like problem. A theoretical analysis is also provided.

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00553913

Soumis le : lundi 10 janvier 2011-11:13:53

Dernière modification le : mardi 14 février 2023-03:36:19

Dates et versions

hal-00553913 , version 1 (10-01-2011)

Identifiants

HAL Id : hal-00553913 , version 1
DOI : 10.1109/ICUMT.2010.5676598

Citer

Matthieu Geist, Olivier Pietquin. Statistically linearized least-squares temporal differences. ICUMT 2010, Oct 2010, Moscow, Russia. pp.450-457, ⟨10.1109/ICUMT.2010.5676598⟩. ⟨hal-00553913⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

10 Consultations

0 Téléchargements

Statistically linearized least-squares temporal differences

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager