Kalman Temporal Differences: the deterministic case

Matthieu Geist; Olivier Pietquin; Gabriel Fricout

doi:10.1109/ADPRL.2009.4927543

Communication Dans Un Congrès Année : 2009

Kalman Temporal Differences: the deterministic case

(1, 2) , (1) , (2)

1
2

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

SUPELEC-Campus Metz

ArcelorMittal Maizières Research SA

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

SUPELEC-Campus Metz

Gabriel Fricout

Fonction : Auteur

ArcelorMittal Maizières Research SA

Résumé

This paper deals with value function and $Q$-function approximation in deterministic Markovian decision processes. A general statistical framework based on the Kalman filtering paradigm is introduced. Its principle is to adopt a parametric representation of the value function, to model the associated parameter vector as a random variable and to minimize the mean-squared error of the parameters conditioned on past observed transitions. From this general framework, which will be called Kalman Temporal Differences (KTD), and using an approximation scheme called the unscented transform, a family of algorithms is derived, namely KTD-V, KTD-SARSA and KTD-Q, which aim respectively at estimating the value function of a given policy, the $Q$-function of a given policy and the optimal $Q$-function. The proposed approach holds for linear and nonlinear parameterization. This framework is discussed and potential advantages and shortcomings are highlighted.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

Supelec471.pdf (146.26 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00380870

Soumis le : mercredi 6 mai 2009-09:44:31

Dernière modification le : mardi 14 février 2023-03:38:23

Archivage à long terme le : jeudi 10 juin 2010-22:41:48

Dates et versions

hal-00380870 , version 1 (06-05-2009)

Identifiants

HAL Id : hal-00380870 , version 1
DOI : 10.1109/ADPRL.2009.4927543

Citer

Matthieu Geist, Olivier Pietquin, Gabriel Fricout. Kalman Temporal Differences: the deterministic case. ADPRL 2009, Mar 2009, Nashville, TN, United States. pp.185-192, ⟨10.1109/ADPRL.2009.4927543⟩. ⟨hal-00380870⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

64 Consultations

418 Téléchargements

Kalman Temporal Differences: the deterministic case

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager