Exploring several approaches for detection out of domain/distribution textual samples

Typical natural language understanding (NLU) models like the ones deployed in virtual assistant (VA) systems are designed to work for a finite set of pre-defined domains. Ideally, if the text representation of a spoken utterance does not belong to the supported set of domains the NLU model would produce a response with low confidence score and the VA would know to ignore such utterances. However, there are cases where the NLU model is overconfident resulting in the VA reacting to an utterance which was not meant for it causing a bad user experience. Rejecting out-of-domain utterances is also crucial for arbitration. The conversational agents we produce are often made up of multiple smaller agents, produced independently by different teams or companies (e.g., our customers make their own additions to the agents). If each of these has good out-of-domain rejection, we can more reliably assign each utterance to the correct agent. The main challenge is that the set of out-of-domain utterances is completely open and cannot be sampled representatively for training. The algorithm should therefore generalize differently for in-system vs. for out-of-system data.

Faculty Supervisor:

Yvonne Coady

Student:

Partner:

Cerence Technologies Inc.

Discipline:

Computer science

Sector:

Technology; Automotive; Artificial Intelligence

University:

University of Victoria

Program: