A Dantzig Selector Approach to Temporal Difference Learning

Matthieu Geist; Bruno Scherrer; Alessandro Lazaric; Mohammad Ghavamzadeh

Communication Dans Un Congrès Année : 2012

A Dantzig Selector Approach to Temporal Difference Learning

(1) , (2) , (3) , (3)

1
2
3

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

IMS : Information, Multimodalité & Signal

Bruno Scherrer

Fonction : Auteur
PersonId : 1406
IdHAL : bruno-scherrer
IdRef : 073360708

Autonomous intelligent machine

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Mohammad Ghavamzadeh

Fonction : Auteur
PersonId : 868946

Sequential Learning

Résumé

LSTD is one of the most popular reinforcement learning algorithms for value function approximation. Whenever the number of samples is larger than the number of features, LSTD must be paired with some form of regularization. In particular, L1-regularization methods tends to perform feature selection by promoting sparsity and thus they are particularly suited in high-dimensional problems. Nonetheless, since LSTD is not a simple regression algorithm but it solves a fixed-point problem, the integration with L1-regularization is not straightforward and it might come with some drawbacks (see e.g., the P-matrix assumption for LASSO-TD). In this paper we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. In particular, we investigate the performance of the algorithm and its relationship with existing regularized approaches, showing how it overcomes some of the drawbacks of existing solutions.

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00749480

Soumis le : mercredi 7 novembre 2012-15:57:28

Dernière modification le : lundi 11 septembre 2023-17:41:18

Dates et versions

hal-00749480 , version 1 (07-11-2012)

Identifiants

HAL Id : hal-00749480 , version 1

Citer

Matthieu Geist, Bruno Scherrer, Alessandro Lazaric, Mohammad Ghavamzadeh. A Dantzig Selector Approach to Temporal Difference Learning. ICML-12, Jun 2012, Edinburgh, United Kingdom. pp.1399-1406. ⟨hal-00749480⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC UNIV-LILLE3 CNRS INRIA LAGIS CENTRALESUPELEC UNIV-LORRAINE INRIA2 LORIA LORIA-AIS

348 Consultations

0 Téléchargements

A Dantzig Selector Approach to Temporal Difference Learning

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager