Policy Optimization in Parameter Space

Model-free Reinforcement Learning (RL) has recently demonstrated its great potential in solving difficult intelligent tasks. However, developing a successful RL model requires an extensive model tuning and tremendous training samples. Theoretical analysis of these RL methods, more specifically policy optimization methods, only stay in a simple setting where the learning happens in the policy space. This project attempts to advance the analysis of the policy optimization methods to a more realistic setting in the parameter space. We will mainly focus on the convergence properties of the model and the unification of value and policy in the parameter space. New algorithms in policy optimization are expected to originate from the analysis.

Faculty Supervisor:

Dale Schuurmans


Jincheng Mei


Borealis AI


Computer science


Information and communications technologies




Current openings

Find the perfect opportunity to put your academic skills and knowledge into practice!

Find Projects