Policy Optimization in Parameter Space
Model-free Reinforcement Learning (RL) has recently demonstrated its great potential in solving difficult intelligent tasks. However, developing a successful RL model requires an extensive model tuning and tremendous training samples. Theoretical analysis of these RL methods, more specifically policy optimization methods, only stay in a simple setting where the learning happens in the policy space. This project attempts to advance the analysis of the policy optimization methods to a more realistic setting in the parameter space. We will mainly focus on the convergence properties of the model and the unification of value and policy in the parameter space. New algorithms in policy optimization are expected to originate from the analysis.