Understanding Empirical Risk Minimization via Information Theory

Deep neural network (DNN) is a class of machine learning algorithms which is inspired by biological neural networks. Learning with deep neural networks has enjoyed huge empirical success in recent years across a wide variety of tasks. Lately many researchers in machine learning society have become interested in the generalization mystery: why do overparameterized DNN perform well on previously unseen data, even though they have way more parameters than the number of training samples? The Information-theoretic approach for studying generalization is one the frameworks to answer this question. Although information-theoretic approach proves its applicability for several machine learning methods, it suffers from some shortcomings that have hindered progress towards the understanding generalization in DNN. In this project, we aim to improve the information-theoretic methods for generalization which let us find a promising answer to the question of why DNNs generalize well in practice.

Faculty Supervisor:

Ashish Khisti

Student:

Mahdi Haghifam

Partner:

Element AI

Discipline:

Engineering - computer / electrical

Sector:

Professional, scientific and technical services

University:

University of Toronto