End-to-end training with DeterminedAI and the Titan Takeoff Inference Server: From model training to efficient deployment
Seamless data scientist tools for modern deep learning
In today’s fast-evolving AI landscape, the role of a data scientist is both exhilarating and demanding. With foundation models paving the way for breakthroughs across text, image, and multimodal applications, the horizon is limitless. However, there’s a caveat: the burgeoning scale of these foundation models brings infrastructural challenges unheard of in previous years.
Modern data scientists find themselves juggling two extremes — mastering the infrastructural intricacies associated with training massive models, while simultaneously diving deep into the underlying data.
The solution? Equip these professionals with robust tools tailored for the evolving demands of deep learning. This is where DeterminedAI and the Titan Takeoff Inference Server shine. These platforms seamlessly bridge the gap between training and deployment, ensuring that data scientists can maintain their core focus.
Join us as we explore the nuances of fine-tuning a generative model, such as GPT2, using Determined, and then ushering in optimised GPU deployment leveraging int8 quantization through Titan Takeoff.
Step-by-step guide: From training with DeterminedAI to deployment with Titan Takeoff Inference Server
For a clearer understanding, let’s dive into a hands-on example that encompasses:
- Initiating GPT2 model training via DeterminedAI.
- Retrieving the saved checkpoint and molding it into a format compatible with Titan Takeoff.
- Streamlined model deployment using Titan Takeoff.
The beauty of DeterminedAI and Titan Takeoff lies in their user-friendliness. Minimal coding and configuration are needed to hit the ground running. The most intricate aspect is ensuring that the weight mapping from Determined aligns seamlessly with Titan Takeoff’s requirements. So, without further ado, let’s navigate this comprehensive guide on transitioning from DeterminedAI to Titan Takeoff.
Step 1: Model training
Prerequisites: Setting Up the Environment for DeterminedAI and Titan Takeoff
To set up the necessary packages, use the following pip commands:
pip install determinedpip install titan-iris
Deploy a Local Cluster with DeterminedAI using the following command:
det deploy local cluster-up
Once the cluster is active, navigate to localhost:8080 to view all experiment data and cluster details.
Default Login Details:
- Username: admin
- Password: None
For this demonstration, we’ll utilize an example from DeterminedAI’s Resourceful Guides, initiating a GPT2 Fine-tuning Experiment Using DeterminedAI. This fine-tunes GPT2 on wikitext-2, a treasure trove of data scraped from Wikipedia.
Begin by downloading the language-modeling archive and extract its contents. You can do this on linux using tar:
tar zxvf language-modeling.tgz
This will extract the contents to a folder called /language-modeling. Navigate to the folder and there should be a config file: /language-modeling/clm_config.yaml. Be sure to set the correct number of GPUs for your machine in clm_config.yaml
resources: slots_per_trial: <allocate your GPU count here>
To initiate the fine-tuning job, navigate to the folder with the yaml and run the experiment create command:
cd language-modelingdet experiment create -f clm_config.yaml .
After the task is successfully dispatched we can view all the training info and stats on the same dashboard as before at localhost:8080. This is also where we track the training progress and view other relevant details. After this, just wait for the training to complete.
Step 2: Model conversion
Now that training is done we need to download to the model.
Navigate to the Checkpoints Under your experiment and make a note of the UUID for the checkpoint you would like to download.
Use the following command to download the model:
det checkpoint download <your_checkpoint_model_uuid>
Conversion to the HuggingFace Model Format
Once downloaded, the model can be found in the checkpoints folder, identified by its unique UUID. The model weights are saved in the state_dict.pth file. Our next task is to convert this model into the HuggingFace format. This is done by initialising a model using the HuggingFace transformers package, loading the weights into the model class, and then saving it to a directory.
Here’s a script to guide you through this process:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
checkpoint = torch.load('checkpoints/<your_checkpoint_model_uuid>/state_dict.pth')
model_state_dict = dict(checkpoint['models_state_dict'][0])# Remove unexpected keys from state_dict_as_dict
unexpected_keys = ["transformer.h." + str(i) + ".attn.bias" for i in range(12)] + ["transformer.h." + str(i) + ".attn.masked_bias" for i in range(12)]
for key in unexpected_keys:
if key in model_state_dict:
del model_state_dict[key]
model = GPT2LMHeadModel.from_pretrained('gpt2') # Instantiate GPT-2 model
model.load_state_dict(model_state_dict) # Load your weights
tokenizer = GPT2Tokenizer.from_pretrained('gpt2') # Instantiate tokenizer
model.save_pretrained('gpt2_hf') # Save model to gpt2_hf
tokenizer.save_pretrained('gpt2_hf') # Save tokenizer to gpt2_hf
Once the model and tokenizer are saved in the ‘gpt2_hf’ directory, the conversion phase is complete!
Step 3: Model deployment
Moving the Fine-tuned Model
Deploying the fine-tuned model on the Titan Takeoff Inference Server is easy. Begin by moving or copying the model’s folder to ~/.iris_cache. On a Linux system, this can be accomplished with:
cp -r gpt_hf ~/.iris_cache
Deploying with Titan Takeoff
To deploy your model, simply use the following command:
iris takeoff --model gpt_hf --device cuda
The --device flag is optional. If you omit this argument, the model will run on the CPU by default. Once executed, you'll have an optimised GPT-2 model running on your local server.
Using the Titan Takeoff Inference Server you have the option to deploy using a range of quantization types, from bfloat16 to int4, on CPU and GPU, optimised for throughput or latency as needed. This makes it easy to take advantage of all the hardware available to data science teams, and be able to build high performance, scalable applications on top of LLMs.
Inferencing a Model
You can inference the model using the API:
curl http://localhost:8000/generate_stream \ -X POST \ -N \ -H "Content-Type: application/json" \ -d '{"text":"List 3 things you can do in London"}'
Or using the Playground interface at localhost:8000/demos/playground :
For more detailed guidance and advanced features, refer to the Titan Takeoff Docs.
Wrapping up
To wrap things up, we have seen the seamless integration of DeterminedAI and Titan Takeoff allows for a smooth transition from model training to deployment. With a three-phase process that involves model training, conversion, and deployment, users can easily transition from a DeterminedAI-trained model to being fully operational with Titan Takeoff. Keep this guide handy, and remember: from training wheels to full throttle, your model’s journey has never been this streamlined! 🚀
About TitanML
TitanML enables machine learning teams to effortlessly and efficiently deploy large language models (LLMs). Their flagship product, the Titan Takeoff Inference Server is already supercharging the deployments of a number of ML teams.
Founded by Dr. James Dborin, Dr. Fergus Finn and Meryem Arik, and backed by key industry partners including AWS and Intel, TitanML is a team of dedicated deep learning engineers on a mission to supercharge the adoption of enterprise AI.
Our documentation and Discord community are here for your support.
Written by Yicheng Wang
Deploying Enterprise-Grade AI in Your Environment?
Unlock unparalleled performance, security, and customization with the TitanML Enterprise Stack