Enhancing Knowledge Self-Awareness in Large Language Models through Self-Explanation and Self-Reflection

Large language models (LLMs) are becoming ubiquitous given their ability to excel at many tasks. However, there has yet to emerge a comprehensive understanding of their internal mechanisms, in particular with regards to how LLMs use the knowledge it possesses. Previous works have suggested that despite LLMs possessing knowledge, they can often be incapable of using it directly. However, additional works have also proposed the use of self-explanations as a way to induce LLMs to rationalize and explain their behaviour, enabling for the analysis of LLMs even without explicit access or understanding of their internal dynamics. For this end, self-explanations also provide an avenue to better understand whether LLMs understand the scope of their knowledge and if they can use this information to achieve factually correct responses, even if such information is not directly known. Using LLMs within self-reflective loops, we can uncover how LLMs understand the information it has been trained on, how it accesses this information for downstream purposes as well as whether or not it can understand when necessary information is missing, providing an improved understanding of LLMs and better utilizing them in more reliable ways in the future.

Faculty Supervisor:

Sarath Chandar Anbil Parthipan

Student:

Partner:

The University of Tokyo

Discipline:

Computer science

Sector:

Education

University:

Université de Montréal

Program: