Speech enhancement and recognition with generative adversarial network

While taking foreign language tests, people may record responses with different background noises. The contaminated audios can lead to unusual results in speech recognition and scoring by the scoring systems. Pearson would like to develop a more robust system for the automated speech recognition machine to work with clean and noisy records. Audio files are typically from 5 to 90 seconds long. There are popular softwares which are built to address these problems, but their results need to be tested with the particular kinds of inputs that is obtained as test responses. These may have varies types of noise, distortions and various other complicating factors. Improving these systems would greatly benefit Pearson’s competitiveness in the market and would also contribute towards expanding the boundaries of knowledge in speech enhancement.

Faculty Supervisor:

Gerald Penn

Student:

Zibin Yang

Partner:

Pearson Canada Inc.

Discipline:

Computer science

Sector:

University:

University of Toronto

Program: