L2M – LeanPrompt: Accelerating Generative AI with Smarter Resource Utilization

We aim to develop and implement a cost-efficient framework for communicating with proprietary generative AI platforms such as ChatGPT. In fact, these provider companies expose their models through an interface that can be accessed via API. However, calling their API will incur a cost considerably based on our request. Basically, smaller models are cheaper than larger models. However, the smaller models are less capable and may not generate accurate responses. Hence, we aim to reduce the cost while maintaining the accuracy and latency of our product. We will implement this approach with three mechanisms. First, we compress input as the larger queries will charge us more (It is calculated per number of words). Secondly, we use a routing mechanism to use smaller models for simpler tasks and larger models for more complex tasks. Lastly, we will use a caching mechanism to leverage the previously answered data and avoid invoking the models every time. All these approaches would reduce our cost, and also they can preserve the performance of our model with smart techniques. At the end, the partner organization can invest its budget more broadly across different objectives. Also, they can engage more user on their platforms by providing accurate responses.

Faculty Supervisor:

Tushar Sharma

Student:

Partner:

Springboard Atlantic Inc.

Discipline:

Computer science

Sector:

Artificial Intelligence; Clean Technology

University:

Dalhousie University

Program: