Off-policy Learning in Large-scale POMDP-based Dialogue Systems

Lucie Daubigney; Matthieu Geist; Olivier Pietquin

Communication Dans Un Congrès Année : 2012

Off-policy Learning in Large-scale POMDP-based Dialogue Systems

(1, 2) , (2) , (2)

1
2

Lucie Daubigney

Fonction : Auteur
PersonId : 908990

Autonomous intelligent machine

IMS : Information, Multimodalité & Signal

Matthieu Geist

Fonction : Auteur
PersonId : 6945
IdHAL : matthieu-geist

IMS : Information, Multimodalité & Signal

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

IMS : Information, Multimodalité & Signal

Résumé

Reinforcement learning (RL) is now part of the state of the art in the domain of spoken dialogue systems (SDS) optimisation. Most performant RL methods, such as those based on Gaussian Processes, require to test small changes in the policy to assess them as improvements or degradations. This process is called on policy learning. Nevertheless, it can result in system behaviours that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. Such methods usually fail to scale up and are thus not suited for real-world systems. In this contribution, a sample-efficient, online and off-policy RL algorithm is proposed to learn an optimal policy. This algorithm is combined to a compact non-linear value function representation (namely a multilayers perceptron) enabling to handle large scale systems.

Mots clés

Spoken Dialogue Systems Reinforcement Learning

Domaines

Apprentissage [cs.LG] Machine Learning [stat.ML]

Fichier principal

Supelec763.pdf (196.68 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Sébastien Van Luchene : Connectez-vous pour contacter le contributeur

https://centralesupelec.hal.science/hal-00684819

Soumis le : mardi 5 juin 2012-08:36:05

Dernière modification le : lundi 11 septembre 2023-17:41:18

Archivage à long terme le : jeudi 6 septembre 2012-02:20:30

Dates et versions

hal-00684819 , version 1 (05-06-2012)

Identifiants

HAL Id : hal-00684819 , version 1

Citer

Lucie Daubigney, Matthieu Geist, Olivier Pietquin. Off-policy Learning in Large-scale POMDP-based Dialogue Systems. ICASSP 2012, Mar 2012, Kyoto, Japan. pp.4989-4992. ⟨hal-00684819⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

SUPELEC CNRS INRIA SUP_IMS CENTRALESUPELEC UNIV-LORRAINE INRIA2 LORIA LORIA-AIS

338 Consultations

301 Téléchargements

Off-policy Learning in Large-scale POMDP-based Dialogue Systems

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager