Spoken Language Identification for Children

While taking foreign language tests, people may respond in languages other than the expected one. Typical scoring systems are trained only on the expected language, so unexpected language responses can have unusual results in speech recognition and scoring. Pearson would like to develop a more robust system for the automated speech recognition machine to know up front if the response contains non-target language content. Common language labels are English, Spanish, Chinese, Japanese, etc. Audio files are typically from 5 to 90 seconds long. There are popular softwares which are built to address these problems but their results need to be tested with the particular kinds of inputs that is obtained as test responses. These may have strong accent, be children’s speech, and various other complicating factors. Improving these systems would greatly benefit Pearson’s competitiveness in the market and would also contribute towards expanding the boundaries of knowledge in speech processing.

Faculty Supervisor:

Gerald Penn


Aravind Varier


Pearson Education Canada


Computer science



University of Toronto



Current openings

Find the perfect opportunity to put your academic skills and knowledge into practice!

Find Projects