Applied next generation AI accelerator algorithm hardware co-optimization: using quantization, sparsity and hardware constraints during neural net training
This work aims to explore software and hardware co-optimization for deep neural network (DNN) inference applications. Once a model is trained to sufficient accuracy, the model is used to make inference or predictions based on this trained model. With increasing performance, more people are using these models for tasks such as translation, self-driving cars and speech recognition. This has greatly increased the demand for high performance inference hardware. The goal for this project is to investigate novel techniques to reduce latency and power consumption during inference while maintaining the same model accuracy.