Deep learning for predicting mutational effects in protein structures - ON-165
Preferred Disciplines: Computer Science, Biochemistry. Graduate degree (MSc or Ph.D.)
Company: ProteinQure Inc.
Project Length: 4-6 months (1 unit)
Desired start date: ASAP
Location: Toronto, ON
No. of Positions: 1
Preferences: Language: English
About the Company:
ProteinQure (Toronto based startup) is a software platform for computational peptide drug discovery. We combine quantum computing, molecular simulations and machine learning to do the structure based design of drugs. A physics-based approach is less data-dependent and enable us to develop therapeutics for complicated disease targets. We are one of the top graduates of the Quantum Machine Learning stream at the Creative Destruction Lab incubator (University of Toronto) and have partnerships with several quantum computing hardware providers (Rigetti, IBM, Xanadu, D-Wave and Fujitsu).
ProteinQure is combining cutting-edge computational technologies to assist with the design of protein-based therapeutics.
Predicting the effect of sequence mutations in proteins enables a richer understanding of protein disease targets, genetic diseases, and rational protein design. Mutations can be broadly classified as stabilizing or destabilizing when introduced into a fixed protein scaffold of interest, or a numerical precise score can be attributed to a specific change. Numerous existing computational approaches were developed to provide high accuracy predictions of these effects (Rosetta, FoldX), as well as machine learning models, but these tools are fundamentally limited by training dataset sizes. However, a significant souce of data is being generated from biophysical models of proteins that is not routinely being used as a source of data for learning.
It is the objective of this project to develop a machine learning model for the prediction of mutational effects using protein structure as the primary input variable. Protein structural data includes the 3D coordinates of atoms as well chemical information about the atom types (chemical element, charge). Several papers in the field of deep learning have employed 3D structure or molecular graphs to perform molecular machine learning to predict the properties of small molecules, but limited work has been done on proteins. This project will entail detailed research of the methods employed in computational chemistry and an examination of the transferability of these approaches to the prediction of mutational effects in proteins. Validation of machine learning models will be performed as well as benchmarking to existing computational methods.
- Reproduce several examples of well-studied machine learning models used to predict chemical properties
- Collection and curation of a dataset of protein mutations used for training (both static structures and unreleased simulated data from ProteinQure)
- Research and development on methods for the featurization of protein structure (Direct spatial coordinates, 3D convolutional neural networks, graph convolutions, etc.)
- Training of machine learning model and initial hyperparameter search using ProteinQure GPU resources
- Validation and benchmarking of prediction accuracy compared to existing approaches
- Integration of machine learning model into ProteinQure design platform for the optimization of peptides
- Existing molecular machine learning models should be examined within the DeepChem software package, which uses Tensorflow.
- Mutational effect data (thermostability, chaperone misfolding) will be compiled from online databases and paper supplemental information (ProTherm, Rosetta ddG dataset). Protein structures will be obtained from the protein databank
- Methods for the featurization of proteins will be derived from existing scientific literature, and extracted from our structural dataset using Python
Expertise and Skills Needed:
- Machine learning experience (Tensorflow, PyTorch, etc.)
- Python (NumPy, SciPy, etc.)
- Familiarity with molecular structure (Preferred)
For more info or to apply to this applied research position, please