Validating and improving Large Language Models for procedural tasks

Engineering organisations like Thales rely on large quantities of technical knowledge. When resolving a technical
problem, for example, users have to follow a multi-step procedure in which the steps are described with various
levels of detail, may not be up to date, or may not target the exact problem they are facing. Recent progress in
Large Language Models (LLM) showed capabilities for these models to reason over procedural knowledge but it
is still very difficult to evaluate if these models will be able to support users in executing complex, procedural tasks
in various scenarios. This project will address this research question by creating evaluation tasks for procedural
knowledge in order to test the performance of LLM, namely ChatGPT. These tasks are expected to form the
foundation of an API for LLMs applied to procedural knowledge, and will generate performance metrics that will
enable us to identify gaps in LLM abilities. These gaps will be addressed in a subsequent project.
7.3. Participant

Faculty Supervisor:

Bang Liu

Student:

Partner:

Thales Canada Inc (Montreal, QC)

Discipline:

Computer science

Sector:

Management of companies and enterprises; Manufacturing; Professional, scientific and technical services

University:

Université de Montréal

Program: