Optimizing dialog policy with large action spaces using deep reinforcement learning
Abstract
Dialogue policy is responsible to select the next appropriate action from the current dialogue state to accomplish the user goal efficiently. Present commercial task-oriented dialogue systems are mostly rule-based; thus, they are not easily scalable to adapt multiple domains. To design an adaptive dialogue policy, user feedback is an essential parameter. Recently, deep reinforcement learning algorithms have been popularly applied to such problems. However, managing large state-action space is time consuming and computationally expensive. Additionally, it requires good quality and a reliable user simulator to train the dialogue policy which takes additional design efforts. In this paper, we propose a novel approach to improve the performance of dialogue policy by accelerating the training process by using imitation learning for deep reinforcement learning. We utilized proximal policy optimization (PPO) algorithm to model dialogue policy using a large-scale multi-domain tourist dataset MultiWOZ2.1. We observed a remarkable performance of dialogue policy with 91.8% task success rate, and an approximate 50% decrease in the average number of turns required to complete tasks without using user simulator in the early phase of training cycles. This approach is expected to help researchers to design computationally efficient and scalable dialogue agents by avoiding training from scratch.
Keywords
Deep reinforcement learning; Dialogue policy; Imitation learning; Proximal policy optimization; Task-oriented dialogue system
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v36.i1.pp428-440
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).