This paper is my first paper which is published in proceedings of the 15th European Chapter of the Association for Computational Linguistics Conference (EACL 2017). EACL 2017 is also my first international conference experience!
On-line dialogue policy learning is the key for building evolvable conversational agent in real world scenarios. Poor initial policy can easily lead to bad user experience and consequently fail to attract sufficient real users for policy training. We propose a novel framework, companion teaching, to include a human teacher in the on-line dialogue policy training loop to address the cold start problem. Here, dialogue policy is trained using not only user’s reward but also teacher’s example action as well as estimated immediate reward at turn level. Simulation experiments showed that, with a small number of human teaching dialogues, the proposed approach can effectively improve user experience at the beginning and smoothly lead to good performance with more user interaction data.
Lu Chen, Runzhe Yang, Cheng Chang, Zihao Ye, Xiang Zhou and Kai Yu.