Reward Shaping for Statistical Optimisation of Dialogue Management

Layla El Asri; Romain Laroche; Olivier Pietquin

doi:10.1007/978-3-642-39593-2_8

Communication Dans Un Congrès Année : 2013

Reward Shaping for Statistical Optimisation of Dialogue Management

(1) , (2) , (1)

1
2

Layla El Asri

Fonction : Auteur
PersonId : 932394

IMS : Information, Multimodalité & Signal

Romain Laroche

Fonction : Auteur

Orange Labs [Issy les Moulineaux]

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

IMS : Information, Multimodalité & Signal

Résumé

This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system's learning. A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one. Two reward shaping methods are applied to a corpus of dialogues evaluated with numerical performance scores. Learning with these functions is compared to the sparse case and it is shown, on simulated dialogues, that the policies learnt after reward shaping lead to higher performance.

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00869809

Soumis le : vendredi 4 octobre 2013-10:57:42

Dernière modification le : mardi 14 février 2023-03:38:13

Dates et versions

hal-00869809 , version 1 (04-10-2013)

Identifiants

HAL Id : hal-00869809 , version 1
DOI : 10.1007/978-3-642-39593-2_8

Citer

Layla El Asri, Romain Laroche, Olivier Pietquin. Reward Shaping for Statistical Optimisation of Dialogue Management. SLSP 2013, Jul 2013, Tarragona, Spain. pp.93-101, ⟨10.1007/978-3-642-39593-2_8⟩. ⟨hal-00869809⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CENTRALESUPELEC

62 Consultations

0 Téléchargements

Reward Shaping for Statistical Optimisation of Dialogue Management

Résumé

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager