Improving the Performance and Convergence Rate of Transformer-Based Language Models
The pre-trained Bi-directional Encoder Representation from Transformers (BERT) model had proven to be a milestone in the field of Neural Machine Translation, achieving new state-of-the-art performances on many tasks in the field of Natural Language Processing. Despite its success, it has been noticed that there are still a lot of room for improvement, both in terms of training efficiency and structural design. The proposed research project would explore the detailed design decision of BERT on many levels, and optimize them wherever possible. The expected result would be an improved language model that achieves higher performance on NLP tasks while using less computational resources.