A statistical method for competing risk survival analysis with clustered big data

Over the last few years, the data revolution occurred with the emergence of Big data. In medical field, the term big data refers to large databases in terms of patients and/or information from varied sources. Nevertheless, heterogeneity is encountered in this kind of data. Indeed, data arise from different medical centers. Furthermore, we cant perform traditional statistical methods on these large databases: major problem are multicollinearity and overfitting. Lots of regularization methods have been proposed in order to adapt classical methods. Mittal et al. have challenged to adapt survival analysis methods to these emerging data sets. Survival analyses consist in modelling time to event in presence of censoring (unobserved event). One of the main assumption of the most popular survival model is the non-informative censoring which means that censoring is independent of the event time.

Faculty Supervisor:

Mary Thompson

Student:

Marie DE ANTONIO

Partner:

Discipline:

Statistics / Actuarial sciences

Sector:

University:

University of Waterloo

Program:

Globalink Research Award