Development of an Al first molecular database to accelerate drug discovery

The project aims to develop a molecular compounds database to accelerate drug discovery. Compounds shared by chemical providers are currently stored in large library files. Due to their size and number, these files are a bottleneck in virtual screening. The molecular database will gather them in a centralised entity, thus making library processing efficient. The project will also develop partner relationships by offering a better data sharing experience through a user interface and a programmatic client. Besides storing known compounds, the database will implement an unknown compound enumeration feature. Combined with an active learning model, experts will be able to use the database to manually explore chemical subspaces to find hit compounds. The last phase of the project will focus on extending open source work to provide a privacy preserving machine learning framework based on the developed molecular database to enable federated learning pipelines for virtual screening.

Sacha Levy
Faculty Supervisor: 
Reihaneh Rabbany
Partner University: