Analysis of socio-demographics with missing values in the UK Biobank

Data acquisition at scale implies missing values: in the biological signals as well as in the demographics and questionnaire data. These missing values are structured –missingness appears as blocks– and often causal –more missing health information for lower income individuals. While there are many works on the treatment of missing values in the clinical trial literature, little so far has been done on the specific case of the UK biobank cross sectional data and the impact of the missing data strategy on the estimation of the statistical links between behavioral or clinical assessments and imaging phenotypes.
The goal of this project is to investigate new strategies for the handling of missing data in population imaging dataset. A standard practice for the treatment of missing values consists in applying multiple imputation procedures. For predictive studies, recent theoretical results show how imputation should be combined with predictive models and cross validation procedures to give a prediction that is optimal when the data are missing at random. However, in population imaging, data are seldom missing at random.

Faculty Supervisor:

Jean-Baptiste Poline

Student:

Partner:

École des ponts ParisTech

Discipline:

Computer science

Sector:

Education

University:

McGill University

Program:

Globalink Research Award

Current openings

Find the perfect opportunity to put your academic skills and knowledge into practice!

Find Projects