Leveraging unlabelled, off-task data to improve ASR for low-resource languages based on the transferability of acoustic features learned by deep neural networks

Deep neural networks (DNNs) for automatic speech recognition (ASR) require large amounts of labelled data, which can be difficult and expensive to collect. However, recent research has shown that some features learned by DNNs are highly transferable to other tasks and datasets. Here we propose to design a multi-lingual training procedure to leverage large amounts of off-task data based on the transferability of acoustic features learned by DNNs. Our primary goal is to improve ASR for low-resource languages. Several networks will be trained on different languages and the transferability of learned features will be assessed by substituting layers across networks. Our training procedure will reserve the limited labelled data primarily for learning the features that cannot be learned from other datasets. This project will contribute to the intern’s ongoing PhD work on the nature of auditory representations for natural sound and improve Nuance’s existing ASR systems.

Jessica Thompson
Faculty Supervisor: 
Yoshua Bengio
Partner University: