Related projects
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
Model-free Reinforcement Learning (RL) has recently demonstrated its great potential in solving difficult intelligent tasks. However, developing a successful RL model requires an extensive model tuning and tremendous training samples. Theoretical analysis of these RL methods, more specifically policy optimization methods, only stay in a simple setting where the learning happens in the policy space. This project attempts to advance the analysis of the policy optimization methods to a more realistic setting in the parameter space. We will mainly focus on the convergence properties of the model and the unification of value and policy in the parameter space. New algorithms in policy optimization are expected to originate from the analysis.
Dale Schuurmans
Jincheng Mei
Borealis AI
Computer science
Information and communications technologies
Accelerate
Discover more projects across a range of sectors and discipline — from AI to cleantech to social innovation.
Find the perfect opportunity to put your academic skills and knowledge into practice!
Find ProjectsThe strong support from governments across Canada, international partners, universities, colleges, companies, and community organizations has enabled Mitacs to focus on the core idea that talent and partnerships power innovation — and innovation creates a better future.