Speech enhancement and recognition with generative adversarial network

While taking foreign language tests, people may record responses with different environments and equipment. sometimes the recording may not be very clear. The low-quality audios can lead to unusual results in speech recognition and scoring by the scoring systems. Audios with a higher resolution (sample rates) contain richer information since greater frequency ranges can be represented in the data, capturing greater level of detail and texture to produce high-quality audio, such as sibilants and fricatives. Pearson would like to improve the quality of existing data by transforming audios with lower resolution in the dataset to audios with higher resolution, in order to better train and develop a more robust system for the automated speech recognition machine. Improving these systems would greatly benefit Pearson’s competitiveness in the market and would also contribute towards expanding the boundaries of knowledge in speech enhancement.

Jinda Huang
Faculty Supervisor: 
Gerald Penn
Partner University: