An information-theoretic framework for understanding generalization in neural networks

Deep neural network (DNN) is a class of machine learning algorithms which is inspired by biological neural networks. DNNs are themselves general function approximations, which is the reason they can be applied to almost any machine learning problem. Their applications can be found in visual object recognition in computer vision, translating texts in unsupervised learning, etc. DNNs are prone to overfitting because DNNs usually have many more parameters than the available training data. However, they usually have a low error on the test data. This surprising fact has motivated the scientific community to study the generalization performance of DNNs. Nevertheless, the previous attempts do not lead to a satisfying answer to the aforementioned question. In this project, we aim to introduce an information-theoretic framework which let us find a promising answer to the question of why DNNs generalize well in practice.

Faculty Supervisor:

Daniel Roy

Student:

Mahdi Haghifam

Partner:

Element AI

Discipline:

Visual arts

Sector:

Information and communications technologies

University:

Program: