Applications of deep learning to large-scale data analysis in mass spectrometry-based proteomics - Year Two

As a result of recent advances in high-throughput technologies, rapidly increasing amounts of mass spectrometry (MS) data pose new opportunities as well as challenges to existing analysis methods. Novel computational approaches are needed to take advantage of latest breakthroughs in high-performance computing for the large-scale analysis of big data from MS-based proteomics. In this project, we aim to develop new applications of deep learning and neural networks for the analysis of MS data.

Applications of deep learning to large-scale data analysis in mass spectrometry-based proteomics

As a result of recent advances in high-throughput technologies, rapidly increasing amounts of mass spectrometry (MS) data pose new opportunities as well as challenges to existing analysis methods. Novel computational approaches are needed to take advantage of latest breakthroughs in high-performance computing for the large-scale analysis of big data from MS-based proteomics. In this project, we aim to develop new applications of deep learning and neural networks for the analysis of MS data.

Precursor charge prediction for improved peptide identification with mass spectrometry

The research project aims to develop an effective method that utilizes multiple features to improve mass spectrometry based peptide identification with database search approach. The project is a continuation to the student’s previous research on precursor charge state prediction, since predicted charge state is a novel feature and has a great potential to discriminate the correct and incorrect peptide identifications.

Development of Loop Modeling and Sidechain Packing Algorithm for Protein Structure Prediction.

Proteins play crucial roles in almost every biological process. The function of a protein depends on the specific spatial shape that a protein takes in nature. The Protein Structure Prediction problem is to predict the tertiary structure of a protein from its amino acid sequence. Experimental methods (NMR spectroscopy and X-ray crystallography) are slow and expensive. There is an increasing gap between the number of existing protein sequences and that of the known protein structures. Computational methods have the potential to rapidly and effectively annotate these protein sequences.