Generative Adversarial Neural Network for Viseme Creation- ON-463

Desired discipline(s): Engineering - computer / electrical, Engineering, Computer science, Mathematical Sciences
Company: MemeSpeak Inc
Project Length: 4 to 6 months
Preferred start date: 06/01/2021
Language requirement: English
Location(s): Toronto, ON, Canada; Canada
No. of positions: 1
Search across Mitacs’ international networks - check this box if you’d also like to receive profiles of researchers based outside of Canada: 
No

About the company: 

MemeSpeak is Toronto-based software company that has developed a web-based multiuser, multi-language text-to-speech ip-based messaging application. The application is cloud based and supports text to natural speech creation in several Asian and European languages. Speech is used to animate speaking faces, which then are both archived and can be used as streamable video messages on social networks or embedded in text messages. The IOS and Android mobile apps allow message sharing through any messaging apps a user may have on their handset. The system does not depend on proprietary Google, Apple, Amazon or Facebook APIs. Each user maintains their own YouTube-like messaging archive and can could to distribute their messages however they wish. Non-users of the system can view the messages as well as subject-matter related message playlists. (see www.memespeaker.com ).

The GAN library would represent an improvement in viseme facial animation as well as a utility that fully automates the viseme creation process for MemeSpeak or other potential animation applications.

Please describe the project.: 

The goal of the project is to develop a generative adversarial neural network application which can be used to automatically create English language visemes(visual phonemes) for text to speech animation by using a single input image of an actual human face. The application will be designed to run on Linux servers. “Visemes” are visual phonemes, and they are used in all speech created facial animation systems. Input to this system would be a single human face – a “selfie” from a mobile app.

The project will require network training and optimization&coding strategies. However, we require expertise and knowledge to determine and implement the most appropriate strategy. Some have already been considered as some papers on similar, but different approaches, have been published. We also have a viseme data set, and other human face data, and a test application(memespeaker.com), that can be used to test the viseme creation library with speech/phoneme inputs for animation.

The programming language would be Python and the candidate would use Google CoLab for application testing. There are a few GitHub applications that can be used as a code basis, depending on the approach.

Required expertise/skills: 

A relevant computer science background with a knowledge of generative adversarial neural networks.

- Python

- ML solid knowledge and best practices to make strategic decisions. 

- GANs (Training, Evaluating Performance, Deployment of Models on the Cloud)

- Google CoLab

- GitHub