Runzhe Yang | Personal Website

Policy Optimization with Monotonic Improvement Guarantee

Posted on May 25, 2017

This article is about the theoretical derivation of Policy Improvement Bound and practical policy optimization algorithms discussed in the paper Trust Region Policy Optimization (Schulman, et al, 2015). TRPO is an interesting idea which optimizes policies with guaranteed monotonic improvement. In theory, its algorithm design looks elegant and justified. In... [Read More]

Reinforcement Learning, Original Research

"Tony" Runzhe Yang

"Tony" Runzhe Yang

Policy Optimization with Monotonic Improvement Guarantee