Using Machine Learning for audio analysis and synthesize

Voices.com, the largest online marketplace for voice talent, have identified Machine Learning as an enabler for
future growth. In particular, incorporating Natural Language Processing (NLP) into structured queries and
automatic classification of sample recordings. The first phase of this research involving NLP is in the process of
being integrated into production. The second phase will be to automatically classify sound samples. This has been
historically difficult resulting in low levels of accuracy, but we will take advantage of new ML techniques, and one
of the world’s largest databased of tagged audio. This classification will cover areas of current research, such as
gender and age detection, but extend to new areas including style and emotion. Having completed this
classification, we will be able to incorporate emotion into voice synthesis, increasing the acceptance and usability
of Voice AI.

Faculty Supervisor:

Christopher Anand

Student:

Partner:

Voices

Discipline:

Computer science

Sector:

Information and Communications Technology; Technology; New and Digital Media

University:

McMaster University

Program: