Agent-Aware Dropout DQN for Safe and Efficient On-line Dialogue Policy Learning

Our new paper integrating hand-crafted rules and reinforcement learning approaches has been accepted by the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). An Agent-Aware Dropout Deep Q-Network (AAD-DQN) is proposed in this paper to estimate uncertainty of the learning process. [Read More]
Dialogue Systems, Reinforcement Learning, Original Research

Policy Optimization with Monotonic Improvement Guarantee

This article is about the theoretical derivation of Policy Improvement Bound and practical policy optimization algorithms discussed in the paper Trust Region Policy Optimization (Schulman, et al, 2015). TRPO is an interesting idea which optimizes policies with guaranteed monotonic improvement. In theory, its algorithm design looks elegant and justified. In... [Read More]
Reinforcement Learning, Original Research

On-line Dialogue Policy Learning with Companion Teaching

This paper is my first paper which is published in proceedings of the 15th European Chapter of the Association for Computational Linguistics Conference (EACL 2017). EACL 2017 is also my first international conference experience! [Read More]
Dialogue Systems, Reinforcement Learning, Human-in-the-Loop, Original Research