Distillation
Distillation is the process of using a larger model to train smaller models. This has been shown to be more effective than training small models from scratch [1]. It can involve using intermediate states of the larger model to assist the smaller model, or using large generative models to produce new text from which the smaller model is trained on.
Related Articles
No items found.