Development of a smartphone-based acoustic environment tracker for speech enhancement applications

Speech signals propagating in enclosed environments are distorted by two important, environment-related factors: a) the multiple reflections of the signal from the walls and other objects present in the room, which are called coloration and reverberation, for early and late reflections respectively, and b) competing acoustic signals coming from other sound sources than the speaker, called background noise. Such distortions degrade not only the perceived speech quality and intelligibility for human listeners (either listening to the original distorted speech, speech transmitted by a telephone, or an assistive listening device), but also hampers automatic speech and speaker recognition systems. To try to mitigate these effects, speech enhancement algorithms have been widely used, as well as specific acoustic models matching the environmental characteristics, in the case of automatic speech/speaker recognition applications.

While there are several methods for experimentally measuring the effect of environmental distortions given a clean reference signal, such methods cannot be used in real-time applications as a reference signal is seldom available. Therefore, the so-called blind measures (i.e., measures that do not require a reference signal) have to be employed.

We have recently proposed non-intrusive speech quality, intelligibility, and reverberation time estimation measures. Such measures were shown to accurately estimate speech quality/intelligibility across noise-only, reverberation-only and noise-plus-reverberation listening conditions. Adapted versions of these metrics were also shown to estimate speech quality and intelligibility in complex listening environments for hearing aid and cochlear implant users. These metrics showed performance inline with those obtained with state-of-the-art measures, but with the added benefit of not requiring access to a clean reference signal.

Automatically assessing acoustic environment characteristics can be useful to improve the performance of speech enhancement algorithms. Most speech enhancement methods consider low-level features extracted from the distorted speech signal, such as estimated signal-to-noise ratio, as a proxy for measuring the amount of speech distortion present in the signal, and rely on this information to adjust how the speech enhancement algorithm works. However, higher-level characteristics, such as reverberation time and speech quality/intelligibility, began to be explored only recently.

In this project, we aim to develop environment-aware speech enhancement algorithms, taking into account the predictions of our blind measures of acoustic environment characteristics. As a first step to enable the use of these measures in speech enhancement algorithms, understanding how such features behave in real-world, time-varying environments is important. For that end, we are going to develop a tool to track the evolution of our blind measures over time as a smartphone application. The application will periodically record audio segments, compute, and log the measures. The user will be able to tag measurements as corresponding to specific places (e.g., inside a room, automobile, on the street), and also annotate them with comments and quality scores. The information stored by the application will later be analysed by researchers and used to detect possible limitations of the blind measures. The application will later be extended to perform environment-aware speech enhancement as well; however, this is outside of the scope of this short-term project.

Faculty Supervisor:

Tiago Falk

Student:

Blas Kolic

Partner:

Discipline:

Journalism / Media studies and communication

Sector:

University:

Program:

Globalink