Evaluating LLMs for Sentence Encoding and Clustering to support Thematic Analysis in Qualitative Research

The project “Evaluating LLMs for Sentence Encoding and Clustering to support Thematic Analysis in Qualitative Research” aims to provide qualitative researchers with advanced machine learning techniques for social media data analysis while maintaining autonomy and ownership of their data analysis. By benchmarking modern Language and Large Language Models against established metrics such as coherence and topic diversity, the toolkit bridges the gap between technical expertise and qualitative research needs. This approach broadens the impact of computational tools across diverse domains like health, education, and governance.
The expected outcome of this project is a report detailing the findings from the benchmarking analysis and user feedback. This report will provide insights into the performance of Large Language Models for thematic analysis tasks and outline recommendations for researchers and practitioners seeking to implement computational tools for qualitative data analysis, contributing to the advancement of knowledge in the field of computational social science.

Faculty Supervisor:

Jim Wallace

Student:

Partner:

National University of Kyiv-Mohyla Academy

Discipline:

Computer science

Sector:

Artificial Intelligence; Information and Communications Technology

University:

University of Waterloo

Program: